I’m using a postgresql (15.5) database with the following logic:
We have "meters" who measure electrical consumption.
We have "computer meters" who are calculation of the measurements of different meters. For exemple "computedMeterA" = "meterA" + "meterB"
I have classical "dimensions" for meter (2500 records) and computer meters with id, name, … and a fact_consumption with the hourly data for each meter. (about 20 millions records)
The combination of computed meters are stored in a parameters table who has the following structure :
create table param_computed_meters (
id integer not null
constraint param_computed_meters_pk
primary key,
id_computed_meter integer
constraint param_computed_meters_id_computed_meter_fk
references powerbi.dim_computed_meter
on delete cascade,
id_meter integer,
factor double precision );
And i created a view who calculates the consumptions of the computed meters as following :
create view v_computed_consumption
(ts, id_computed_meter, conso_with_factor, available_meters) as
SELECT fact.ts,
params.id_computed_meter,
sum(fact.gross_consumption * compteur.conversion_factor * params.factor) AS conso_with_factor
FROM param_computed_meters params
JOIN fact_consumption fact ON params.id_meter = fact.id_meter
JOIN dim_meter compteur ON params.id_meter = compteur.id_meter
JOIN dim_computed_meter cm ON cm.id_computed_meter = params.id_computed_meter
GROUP BY fact.ts, params.id_site, params.id_computed_meter;
So here’s when it get’s strange:
When i join my view (fact) to my computed_meter dimension like this i’ve got awesome performance (<1 sec):
select *
from v_computed_consumption v
join dim_computed_meter cm
on (v.id_computed_meter = cm.id_computed_meter)
where cm.id_computed_meter = 71
order by ts desc;
When i filter on the name of the computed meter instead of the id, performance drops drasticly (about 1 min…) :
select *
from v_computed_consumption v
join dim_computed_meter cm
on (v.id_computed_meter = cm.id_computed_meter)
where cm.computed_meter_name = 'General Water D'
order by ts desc;
So i tried to make it "less dumb" like this and performance is great again :
select *
from v_computed_consumption v
join dim_computed_meter cm
on (v.id_computed_meter = cm.id_computed_meter)
where cm.id_computed_meter =
(select id_computed_meter
from dim_computed_meter
where computed_meter_name = 'General Water D')
order by ts desc;
But this solution is not applicable for my star-schema structure that i use in powerbi to show the data.
Here are the explain plan of the first and the second query. I get that the first makes good use of indexes and the second not, but can’t figure out why and how to fix it.
Thanks if you got this far, any help or hints would be awesome 🙂
Note: i do use timescaledb and my table is an hypertable (you probably will see it in the explain plan) but i converted the table to a "regular" table and got exactly the same problem so i’m pretty sure it’s not related.
First explain plan (good performance)
Sort (cost=1305.72..1306.72 rows=400 width=77)
Sort Key: fact_1.ts DESC
-> Nested Loop (cost=1274.21..1288.43 rows=400 width=77)
-> Index Scan using dim_computed_meter_id_computed_meter_index on dim_computed_meter cm (cost=0.27..2.49 rows=1 width=53)
Index Cond: (id_computed_meter = 71)
-> HashAggregate (cost=1273.94..1277.94 rows=400 width=24)
" Group Key: fact_1.ts, params.id_site, params.id_computed_meter"
-> Nested Loop (cost=0.98..1020.71 rows=25323 width=24)
-> Nested Loop (cost=0.55..5.87 rows=2 width=12)
-> Index Only Scan using dim_computed_meter_id_computed_meter_index on dim_computed_meter cm_1 (cost=0.27..2.49 rows=1 width=8)
Index Cond: (id_computed_meter = 71)
-> Index Scan using idx_params_id_computed_meter on param_computed_meters params (cost=0.28..3.36 rows=2 width=12)
Index Cond: (id_computed_meter = 71)
-> Append (cost=0.42..406.00 rows=10142 width=20)
-> Index Only Scan using _hyper_1_1_chunk_fact_consumption_id_meter_ts_index on _hyper_1_1_chunk fact_1 (cost=0.42..12.73 rows=326 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_2_chunk_fact_consumption_id_meter_ts_index on _hyper_1_2_chunk fact_2 (cost=0.42..29.72 rows=511 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_3_chunk_fact_consumption_id_meter_ts_index on _hyper_1_3_chunk fact_3 (cost=0.42..14.39 rows=515 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_4_chunk_fact_consumption_id_meter_ts_index on _hyper_1_4_chunk fact_4 (cost=0.42..17.87 rows=557 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_5_chunk_fact_consumption_id_meter_ts_index on _hyper_1_5_chunk fact_5 (cost=0.42..18.14 rows=541 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_6_chunk_fact_consumption_id_meter_ts_index on _hyper_1_6_chunk fact_6 (cost=0.42..20.96 rows=545 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_7_chunk_fact_consumption_id_meter_ts_index on _hyper_1_7_chunk fact_7 (cost=0.42..15.46 rows=545 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_8_chunk_fact_consumption_id_meter_ts_index on _hyper_1_8_chunk fact_8 (cost=0.42..14.86 rows=542 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_9_chunk_fact_consumption_id_meter_ts_index on _hyper_1_9_chunk fact_9 (cost=0.42..14.40 rows=547 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_10_chunk_fact_consumption_id_meter_ts_index on _hyper_1_10_chunk fact_10 (cost=0.42..14.43 rows=549 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_11_chunk_fact_consumption_id_meter_ts_index on _hyper_1_11_chunk fact_11 (cost=0.42..14.22 rows=537 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_12_chunk_fact_consumption_id_meter_ts_index on _hyper_1_12_chunk fact_12 (cost=0.42..24.57 rows=531 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_13_chunk_fact_consumption_id_meter_ts_index on _hyper_1_13_chunk fact_13 (cost=0.42..17.59 rows=541 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_14_chunk_fact_consumption_id_meter_ts_index on _hyper_1_14_chunk fact_14 (cost=0.42..14.72 rows=534 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_15_chunk_fact_consumption_id_meter_ts_index on _hyper_1_15_chunk fact_15 (cost=0.42..18.22 rows=514 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_16_chunk_fact_consumption_id_meter_ts_index on _hyper_1_16_chunk fact_16 (cost=0.42..16.59 rows=515 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_17_chunk_fact_consumption_id_meter_ts_index on _hyper_1_17_chunk fact_17 (cost=0.42..22.60 rows=513 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_18_chunk_fact_consumption_id_meter_ts_index on _hyper_1_18_chunk fact_18 (cost=0.42..14.42 rows=517 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_19_chunk_fact_consumption_id_meter_ts_index on _hyper_1_19_chunk fact_19 (cost=0.42..30.09 rows=501 width=20)
Index Cond: (id_meter = params.id_meter)
-> Index Only Scan using _hyper_1_20_chunk_fact_consumption_id_meter_ts_index on _hyper_1_20_chunk fact_20 (cost=0.42..8.27 rows=260 width=20)
Index Cond: (id_meter = params.id_meter)
-> Seq Scan on _hyper_1_21_chunk fact_21 (cost=0.00..1.02 rows=1 width=20)
Filter: (params.id_meter = id_meter)
Second explain plan (bad performance):
Sort (cost=971085.79..971086.29 rows=202 width=77)
Sort Key: fact_10.ts DESC
-> Hash Join (cost=968813.31..971078.05 rows=202 width=77)
Hash Cond: (params.id_computed_meter = cm.id_computed_meter)
-> Finalize HashAggregate (cost=968810.80..969810.80 rows=100000 width=24)
" Group Key: fact_10.ts, params.id_site, params.id_computed_meter"
-> Gather (cost=956810.80..967810.80 rows=100000 width=24)
Workers Planned: 1
-> Partial HashAggregate (cost=955810.80..956810.80 rows=100000 width=24)
" Group Key: fact_10.ts, params.id_site, params.id_computed_meter"
-> Hash Join (cost=128.33..747935.06 rows=20787574 width=24)
Hash Cond: (fact_10.id_meter = params.id_meter)
-> Parallel Append (cost=0.00..335620.98 rows=10896533 width=20)
-> Parallel Seq Scan on _hyper_1_10_chunk fact_10 (cost=0.00..15472.43 rows=600843 width=20)
-> Parallel Seq Scan on _hyper_1_9_chunk fact_9 (cost=0.00..15402.08 rows=598408 width=20)
-> Parallel Seq Scan on _hyper_1_7_chunk fact_7 (cost=0.00..15353.64 rows=595564 width=20)
-> Parallel Seq Scan on _hyper_1_13_chunk fact_13 (cost=0.00..15346.82 rows=593582 width=20)
-> Parallel Seq Scan on _hyper_1_11_chunk fact_11 (cost=0.00..15267.98 rows=592698 width=20)
-> Parallel Seq Scan on _hyper_1_6_chunk fact_6 (cost=0.00..15226.18 rows=590818 width=20)
-> Parallel Seq Scan on _hyper_1_8_chunk fact_8 (cost=0.00..15206.71 rows=590471 width=20)
-> Parallel Seq Scan on _hyper_1_12_chunk fact_12 (cost=0.00..15054.48 rows=583948 width=20)
-> Parallel Seq Scan on _hyper_1_14_chunk fact_14 (cost=0.00..15017.74 rows=582974 width=20)
-> Parallel Seq Scan on _hyper_1_5_chunk fact_5 (cost=0.00..14962.39 rows=579139 width=20)
-> Parallel Seq Scan on _hyper_1_4_chunk fact_4 (cost=0.00..14694.30 rows=569030 width=20)
-> Parallel Seq Scan on _hyper_1_18_chunk fact_18 (cost=0.00..14385.80 rows=558680 width=20)
-> Parallel Seq Scan on _hyper_1_19_chunk fact_19 (cost=0.00..14321.08 rows=553408 width=20)
-> Parallel Seq Scan on _hyper_1_17_chunk fact_17 (cost=0.00..14310.16 rows=550516 width=20)
-> Parallel Seq Scan on _hyper_1_16_chunk fact_16 (cost=0.00..14110.21 rows=547621 width=20)
-> Parallel Seq Scan on _hyper_1_15_chunk fact_15 (cost=0.00..13918.32 rows=539832 width=20)
-> Parallel Seq Scan on _hyper_1_3_chunk fact_3 (cost=0.00..13428.45 rows=521645 width=20)
-> Parallel Seq Scan on _hyper_1_2_chunk fact_2 (cost=0.00..13373.35 rows=518235 width=20)
-> Parallel Seq Scan on _hyper_1_1_chunk fact_1 (cost=0.00..9161.19 rows=353219 width=20)
-> Parallel Seq Scan on _hyper_1_20_chunk fact_20 (cost=0.00..7124.01 rows=275901 width=20)
-> Parallel Seq Scan on _hyper_1_21_chunk fact_21 (cost=0.00..1.01 rows=1 width=20)
-> Hash (cost=93.44..93.44 rows=2791 width=12)
-> Hash Join (cost=40.14..93.44 rows=2791 width=12)
Hash Cond: (params.id_computed_meter = cm_1.id_computed_meter)
-> Seq Scan on param_computed_meters params (cost=0.00..45.91 rows=2791 width=12)
-> Hash (cost=33.95..33.95 rows=495 width=8)
-> Seq Scan on dim_computed_meter cm_1 (cost=0.00..33.95 rows=495 width=8)
-> Hash (cost=2.49..2.49 rows=1 width=53)
-> Index Scan using dim_computed_meter_computed_meter_name_index_2 on dim_computed_meter cm (cost=0.27..2.49 rows=1 width=53)
Index Cond: (computed_meter_name = 'General Water D'::text)
Tried to set SET enable_seqscan = OFF and set enable_nestloop = false;
Tried to analyse and fiddle with random_page_cost and effective_cache_size
Tried to get help from dba forums but no success yet.
Like explained above, this sums up the problem perfectly :
select *
from v_computed_consumption2 v
join dim_computed_meter cm
on (v.id_computed_meter = cm.id_computed_meter)
where cm.id_computed_meter = 71
order by ts desc; -- < 1 sec result
select * from v_computed_consumption2 v
join dim_computed_meter cm
on (v.id_computed_meter = cm.id_computed_meter)
where cm.id_computed_meter =
(select id_computed_meter
from dim_computed_meter
where computed_meter_name = 'General Water D')
order by ts desc; -- < 1 sec result
select * from
v_computed_consumption2 v
join dim_computed_meter cm
on (v.id_computed_meter = cm.id_computed_meter)
where cm.computed_meter_name = 'General Water D'
order by ts desc; -- > 60 seconds result
2
Answers
Pls, try this
I built some test data (pasting it here in case someone needs it…)
Unfortunately I didn’t find a solution.
The problem seems to be that it can’t push the WHERE condition inside the GROUP BY unless it applies explicitly to one of the GROUP BY columns and the value is a constant know at planning. So it works with "id=constant" or "id in (list)" but…
…even this one, which uses the id, results in the slow plan.