PostgreSQL query slow when where clause is not on an ID

S233bastienDeWolf
February 24, 2024
213 views
0 votes
2 Answers

I’m using a postgresql (15.5) database with the following logic:

We have "meters" who measure electrical consumption.
We have "computer meters" who are calculation of the measurements of different meters. For exemple "computedMeterA" = "meterA" + "meterB"

I have classical "dimensions" for meter (2500 records) and computer meters with id, name, … and a fact_consumption with the hourly data for each meter. (about 20 millions records)

The combination of computed meters are stored in a parameters table who has the following structure :

create table param_computed_meters (
    id                integer not null
        constraint param_computed_meters_pk
            primary key,
    id_computed_meter integer
        constraint param_computed_meters_id_computed_meter_fk
            references powerbi.dim_computed_meter
            on delete cascade,
    id_meter          integer,
    factor            double precision );

And i created a view who calculates the consumptions of the computed meters as following :

create view v_computed_consumption
            (ts, id_computed_meter, conso_with_factor, available_meters) as
SELECT fact.ts,
       params.id_computed_meter,
       sum(fact.gross_consumption * compteur.conversion_factor * params.factor) AS conso_with_factor
FROM param_computed_meters params
         JOIN fact_consumption fact ON params.id_meter = fact.id_meter
         JOIN dim_meter compteur ON params.id_meter = compteur.id_meter
         JOIN dim_computed_meter cm ON cm.id_computed_meter = params.id_computed_meter
GROUP BY fact.ts, params.id_site, params.id_computed_meter;

So here’s when it get’s strange:

When i join my view (fact) to my computed_meter dimension like this i’ve got awesome performance (<1 sec):

  select * 
  from v_computed_consumption v 
  join dim_computed_meter cm 
     on (v.id_computed_meter = cm.id_computed_meter) 
  where cm.id_computed_meter = 71 
  order by ts desc;

When i filter on the name of the computed meter instead of the id, performance drops drasticly (about 1 min…) :

  select * 
  from v_computed_consumption v 
  join dim_computed_meter cm 
    on (v.id_computed_meter = cm.id_computed_meter) 
  where cm.computed_meter_name = 'General Water D' 
  order by ts desc;

So i tried to make it "less dumb" like this and performance is great again :

  select * 
  from v_computed_consumption v 
  join dim_computed_meter cm 
      on (v.id_computed_meter = cm.id_computed_meter) 
  where cm.id_computed_meter =
             (select id_computed_meter 
              from dim_computed_meter 
              where computed_meter_name = 'General Water D') 
  order by ts desc;

But this solution is not applicable for my star-schema structure that i use in powerbi to show the data.

Here are the explain plan of the first and the second query. I get that the first makes good use of indexes and the second not, but can’t figure out why and how to fix it.

Thanks if you got this far, any help or hints would be awesome 🙂

Note: i do use timescaledb and my table is an hypertable (you probably will see it in the explain plan) but i converted the table to a "regular" table and got exactly the same problem so i’m pretty sure it’s not related.

First explain plan (good performance)

Sort  (cost=1305.72..1306.72 rows=400 width=77)
  Sort Key: fact_1.ts DESC
  ->  Nested Loop  (cost=1274.21..1288.43 rows=400 width=77)
        ->  Index Scan using dim_computed_meter_id_computed_meter_index on dim_computed_meter cm  (cost=0.27..2.49 rows=1 width=53)
              Index Cond: (id_computed_meter = 71)
        ->  HashAggregate  (cost=1273.94..1277.94 rows=400 width=24)
"              Group Key: fact_1.ts, params.id_site, params.id_computed_meter"
              ->  Nested Loop  (cost=0.98..1020.71 rows=25323 width=24)
                    ->  Nested Loop  (cost=0.55..5.87 rows=2 width=12)
                          ->  Index Only Scan using dim_computed_meter_id_computed_meter_index on dim_computed_meter cm_1  (cost=0.27..2.49 rows=1 width=8)
                                Index Cond: (id_computed_meter = 71)
                          ->  Index Scan using idx_params_id_computed_meter on param_computed_meters params  (cost=0.28..3.36 rows=2 width=12)
                                Index Cond: (id_computed_meter = 71)
                    ->  Append  (cost=0.42..406.00 rows=10142 width=20)
                          ->  Index Only Scan using _hyper_1_1_chunk_fact_consumption_id_meter_ts_index on _hyper_1_1_chunk fact_1  (cost=0.42..12.73 rows=326 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_2_chunk_fact_consumption_id_meter_ts_index on _hyper_1_2_chunk fact_2  (cost=0.42..29.72 rows=511 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_3_chunk_fact_consumption_id_meter_ts_index on _hyper_1_3_chunk fact_3  (cost=0.42..14.39 rows=515 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_4_chunk_fact_consumption_id_meter_ts_index on _hyper_1_4_chunk fact_4  (cost=0.42..17.87 rows=557 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_5_chunk_fact_consumption_id_meter_ts_index on _hyper_1_5_chunk fact_5  (cost=0.42..18.14 rows=541 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_6_chunk_fact_consumption_id_meter_ts_index on _hyper_1_6_chunk fact_6  (cost=0.42..20.96 rows=545 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_7_chunk_fact_consumption_id_meter_ts_index on _hyper_1_7_chunk fact_7  (cost=0.42..15.46 rows=545 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_8_chunk_fact_consumption_id_meter_ts_index on _hyper_1_8_chunk fact_8  (cost=0.42..14.86 rows=542 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_9_chunk_fact_consumption_id_meter_ts_index on _hyper_1_9_chunk fact_9  (cost=0.42..14.40 rows=547 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_10_chunk_fact_consumption_id_meter_ts_index on _hyper_1_10_chunk fact_10  (cost=0.42..14.43 rows=549 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_11_chunk_fact_consumption_id_meter_ts_index on _hyper_1_11_chunk fact_11  (cost=0.42..14.22 rows=537 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_12_chunk_fact_consumption_id_meter_ts_index on _hyper_1_12_chunk fact_12  (cost=0.42..24.57 rows=531 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_13_chunk_fact_consumption_id_meter_ts_index on _hyper_1_13_chunk fact_13  (cost=0.42..17.59 rows=541 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_14_chunk_fact_consumption_id_meter_ts_index on _hyper_1_14_chunk fact_14  (cost=0.42..14.72 rows=534 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_15_chunk_fact_consumption_id_meter_ts_index on _hyper_1_15_chunk fact_15  (cost=0.42..18.22 rows=514 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_16_chunk_fact_consumption_id_meter_ts_index on _hyper_1_16_chunk fact_16  (cost=0.42..16.59 rows=515 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_17_chunk_fact_consumption_id_meter_ts_index on _hyper_1_17_chunk fact_17  (cost=0.42..22.60 rows=513 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_18_chunk_fact_consumption_id_meter_ts_index on _hyper_1_18_chunk fact_18  (cost=0.42..14.42 rows=517 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_19_chunk_fact_consumption_id_meter_ts_index on _hyper_1_19_chunk fact_19  (cost=0.42..30.09 rows=501 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Index Only Scan using _hyper_1_20_chunk_fact_consumption_id_meter_ts_index on _hyper_1_20_chunk fact_20  (cost=0.42..8.27 rows=260 width=20)
                                Index Cond: (id_meter = params.id_meter)
                          ->  Seq Scan on _hyper_1_21_chunk fact_21  (cost=0.00..1.02 rows=1 width=20)
                                Filter: (params.id_meter = id_meter)

Second explain plan (bad performance):

Sort  (cost=971085.79..971086.29 rows=202 width=77)
  Sort Key: fact_10.ts DESC
  ->  Hash Join  (cost=968813.31..971078.05 rows=202 width=77)
        Hash Cond: (params.id_computed_meter = cm.id_computed_meter)
        ->  Finalize HashAggregate  (cost=968810.80..969810.80 rows=100000 width=24)
"              Group Key: fact_10.ts, params.id_site, params.id_computed_meter"
              ->  Gather  (cost=956810.80..967810.80 rows=100000 width=24)
                    Workers Planned: 1
                    ->  Partial HashAggregate  (cost=955810.80..956810.80 rows=100000 width=24)
"                          Group Key: fact_10.ts, params.id_site, params.id_computed_meter"
                          ->  Hash Join  (cost=128.33..747935.06 rows=20787574 width=24)
                                Hash Cond: (fact_10.id_meter = params.id_meter)
                                ->  Parallel Append  (cost=0.00..335620.98 rows=10896533 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_10_chunk fact_10  (cost=0.00..15472.43 rows=600843 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_9_chunk fact_9  (cost=0.00..15402.08 rows=598408 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_7_chunk fact_7  (cost=0.00..15353.64 rows=595564 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_13_chunk fact_13  (cost=0.00..15346.82 rows=593582 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_11_chunk fact_11  (cost=0.00..15267.98 rows=592698 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_6_chunk fact_6  (cost=0.00..15226.18 rows=590818 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_8_chunk fact_8  (cost=0.00..15206.71 rows=590471 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_12_chunk fact_12  (cost=0.00..15054.48 rows=583948 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_14_chunk fact_14  (cost=0.00..15017.74 rows=582974 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_5_chunk fact_5  (cost=0.00..14962.39 rows=579139 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_4_chunk fact_4  (cost=0.00..14694.30 rows=569030 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_18_chunk fact_18  (cost=0.00..14385.80 rows=558680 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_19_chunk fact_19  (cost=0.00..14321.08 rows=553408 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_17_chunk fact_17  (cost=0.00..14310.16 rows=550516 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_16_chunk fact_16  (cost=0.00..14110.21 rows=547621 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_15_chunk fact_15  (cost=0.00..13918.32 rows=539832 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_3_chunk fact_3  (cost=0.00..13428.45 rows=521645 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_2_chunk fact_2  (cost=0.00..13373.35 rows=518235 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_1_chunk fact_1  (cost=0.00..9161.19 rows=353219 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_20_chunk fact_20  (cost=0.00..7124.01 rows=275901 width=20)
                                      ->  Parallel Seq Scan on _hyper_1_21_chunk fact_21  (cost=0.00..1.01 rows=1 width=20)
                                ->  Hash  (cost=93.44..93.44 rows=2791 width=12)
                                      ->  Hash Join  (cost=40.14..93.44 rows=2791 width=12)
                                            Hash Cond: (params.id_computed_meter = cm_1.id_computed_meter)
                                            ->  Seq Scan on param_computed_meters params  (cost=0.00..45.91 rows=2791 width=12)
                                            ->  Hash  (cost=33.95..33.95 rows=495 width=8)
                                                  ->  Seq Scan on dim_computed_meter cm_1  (cost=0.00..33.95 rows=495 width=8)
        ->  Hash  (cost=2.49..2.49 rows=1 width=53)
              ->  Index Scan using dim_computed_meter_computed_meter_name_index_2 on dim_computed_meter cm  (cost=0.27..2.49 rows=1 width=53)
                    Index Cond: (computed_meter_name = 'General Water D'::text)

Tried to set SET enable_seqscan = OFF and set enable_nestloop = false;
Tried to analyse and fiddle with random_page_cost and effective_cache_size

Tried to get help from dba forums but no success yet.

Like explained above, this sums up the problem perfectly :

select * 
from v_computed_consumption2 v 
join dim_computed_meter cm 
      on (v.id_computed_meter = cm.id_computed_meter) 
where cm.id_computed_meter = 71 
order by ts desc; -- < 1 sec result

select * from v_computed_consumption2 v 
join dim_computed_meter cm 
      on (v.id_computed_meter = cm.id_computed_meter) 
where cm.id_computed_meter = 
        (select id_computed_meter 
         from dim_computed_meter 
         where computed_meter_name = 'General Water D') 
order by ts desc; -- < 1 sec result

select * from 
v_computed_consumption2 v 
join dim_computed_meter cm 
     on (v.id_computed_meter = cm.id_computed_meter) 
where cm.computed_meter_name = 'General Water D' 
order by ts desc; -- > 60 seconds result

Answers

- ValNik
- February 23, 2024 at 1:59 pm
- 0 votes
0
Pls, try this
```
select * 
  from v_computed_consumption v 
  join dim_computed_meter cm 
      on (cm.computed_meter_name = 'General Water D' 
          and v.id_computed_meter = cm.id_computed_meter) 
  order by ts desc;
```
Login or Signup to reply.

I built some test data (pasting it here in case someone needs it…)

CREATE UNLOGGED TABLE dim_meter(
    id_meter   INTEGER PRIMARY KEY,
    conversion_factor FLOAT NOT NULL
);

CREATE UNLOGGED TABLE fact_consumption(
    id_meter            INTEGER NOT NULL,
    ts                  INTEGER NOT NULL,
    gross_consumption   FLOAT NOT NULL
);

CREATE UNLOGGED TABLE dim_computed_meter(
    id_computed_meter   INTEGER PRIMARY KEY,
    computed_meter_name INTEGER UNIQUE NOT NULL
);

CREATE UNLOGGED TABLE param_computed_meters (
    id_computed_meter INTEGER NOT NULL REFERENCES dim_computed_meter(id_computed_meter) ON DELETE CASCADE,
    id_meter          INTEGER NOT NULL REFERENCES dim_meter(id_meter) ON DELETE CASCADE,
    factor            FLOAT NOT NULL,
    PRIMARY KEY( id_computed_meter, id_meter )
);

INSERT INTO dim_meter SELECT n,random() FROM generate_series( 1, 2500 ) n;
INSERT INTO fact_consumption SELECT m,ts,random() FROM generate_series( 1, 2500 ) m, generate_series( 1, 10000 ) ts;
INSERT INTO dim_computed_meter SELECT n,n FROM generate_series( 1, 250 ) n;
INSERT INTO param_computed_meters SELECT n,n*10+m,random() FROM generate_series( 1, 249 ) n, generate_series( 0, 9 ) m;
ALTER TABLE fact_consumption ADD PRIMARY KEY( id_meter, ts );

VACUUM ANALYZE dim_meter, fact_consumption, dim_computed_meter, param_computed_meters;

Unfortunately I didn’t find a solution.

The problem seems to be that it can’t push the WHERE condition inside the GROUP BY unless it applies explicitly to one of the GROUP BY columns and the value is a constant know at planning. So it works with "id=constant" or "id in (list)" but…

SELECT * FROM (
    SELECT fact.ts,params.id_computed_meter
--,        sum(fact.gross_consumption * compteur.conversion_factor * params.factor) AS conso_with_factor
    FROM param_computed_meters params
             JOIN fact_consumption fact USING (id_meter)
             JOIN dim_meter compteur USING (id_meter)
             JOIN dim_computed_meter USING (id_computed_meter)
    GROUP BY fact.ts, params.id_computed_meter
) theview
  where id_computed_meter in (select d.id_computed_meter from dim_computed_meter d where d.computed_meter_name=1)
  order by ts desc;

…even this one, which uses the id, results in the slow plan.

Please signup or login to give your own answer.

Click here to cancel reply.