Postgresql - Postgres query involving joins and count too slow

code
July 9, 2023
296 views
0 votes
2 Answers

This query below takes way too long to execute

select s.id, s.description, count(distinct et.id) from 
"district s" s 
inner join 
(
    select id.id from "industry_districts" id
) as id2 on id2.id = s.id
inner join 
(
    select id, industry_districts_id from "industries" i
) as i2 on i2.industry_districts_id = id2.id
inner join 
(
    select id, industries_id from "subsectors" si
) as si2 on si2.industries_id = i2.id
inner join 
(
    select id, subsectors_id from "supplys" rs 
) as rs2 on rs2.subsectors_id = si2.id
inner join 
(
    select id, supplys_id, events_id from "event_relations" ers 
) as ers2 on ers2.supplys_id = rs2.id
left join (
    select id, events_id from "event_stories"  
) et on et.events_id = ers2.events_id
group by s.id
limit 10 offset 0

**Explain analyze gives the following result **

Limit  (cost=0.99..20719301.15 rows=10 width=130) (actual time=2107.465..214891.582 rows=10 loops=1)
  ->  GroupAggregate  (cost=0.99..24863161.18 rows=12 width=130) (actual time=2107.462..214891.567 rows=10 loops=1)
        Group Key: s.id
        ->  Nested Loop Left Join  (cost=0.99..23466091.79 rows=279413855 width=126) (actual time=2.073..141429.178 rows=104217089 loops=1)
              ->  Nested Loop  (cost=0.57..607232.85 rows=11644034 width=126) (actual time=2.055..62770.206 rows=8174966 loops=1)
                    ->  Nested Loop  (cost=0.14..3598.24 rows=1389 width=126) (actual time=2.009..56.815 rows=927 loops=1)
                          Join Filter: (si.id = rs.subindustries_id)
                          Rows Removed by Join Filter: 226384
                          ->  Nested Loop  (cost=0.14..117.68 rows=77 width=126) (actual time=1.004..2.502 rows=80 loops=1)
                                Join Filter: (i.id = si.industries_id)
                                Rows Removed by Join Filter: 5309
                                ->  Nested Loop  (cost=0.14..32.60 rows=34 width=126) (actual time=0.513..0.775 rows=34 loops=1)
                                      Join Filter: (ig.id = i.industry_groups_id)
                                      Rows Removed by Join Filter: 671
                                      ->  Nested Loop  (cost=0.14..18.13 rows=12 width=126) (actual time=0.443..0.507 rows=11 loops=1)
                                            Join Filter: (s.id = ig.id)
                                            Rows Removed by Join Filter: 126
                                            ->  Index Scan using "PK_Sectors" on "Sectors" s  (cost=0.14..12.31 rows=12 width=122) (actual time=0.402..0.417 rows=11 loops=1)
                                            ->  Materialize  (cost=0.00..1.38 rows=25 width=4) (actual time=0.002..0.004 rows=12 loops=11)
                                                  ->  Seq Scan on "Industry_Groups" ig  (cost=0.00..1.25 rows=25 width=4) (actual time=0.011..0.019 rows=25 loops=1)
                                      ->  Materialize  (cost=0.00..2.05 rows=70 width=8) (actual time=0.001..0.013 rows=64 loops=11)
                                            ->  Seq Scan on "Industries" i  (cost=0.00..1.70 rows=70 width=8) (actual time=0.007..0.023 rows=70 loops=1)
                                ->  Materialize  (cost=0.00..4.38 rows=159 width=8) (actual time=0.010..0.028 rows=158 loops=34)
                                      ->  Seq Scan on "Subindustries" si  (cost=0.00..3.59 rows=159 width=8) (actual time=0.345..0.386 rows=159 loops=1)
                          ->  Materialize  (cost=0.00..174.03 rows=2869 width=8) (actual time=0.007..0.307 rows=2841 loops=80)
                                ->  Seq Scan on "Russell_3000_Stocks" rs  (cost=0.00..159.69 rows=2869 width=8) (actual time=0.578..4.098 rows=2869 loops=1)
                    ->  Index Scan using "IXFK_Event_Related_Stocks_Russell_3000_Stocks" on "Event_Related_Stocks" ers  (cost=0.44..350.07 rows=8451 width=8) (actual time=0.103..65.007 rows=8819 loops=927)
                          Index Cond: (russell_3000_stocks_id = rs.id)
              ->  Index Scan using "IXFK_Event_Tweets_Katalyst_Events" on "Event_Tweets"  (cost=0.42..1.40 rows=56 width=8) (actual time=0.003..0.006 rows=12 loops=8174966)
                    Index Cond: (katalyst_events_id = ers.katalyst_events_id)
Planning Time: 16.839 ms
Execution Time: 214891.868 ms

Indexes created

CREATE INDEX "IXFK_event_relations_events" ON public."event_relations" USING btree (events_id);

CREATE INDEX "IXFK_event_relations_supplys" ON public."event_relations" USING btree (supplys_id);

CREATE UNIQUE INDEX "PK_event_relations" ON public."event_relations" USING btree (id);

CREATE INDEX "IXFK_event_stories_events" ON public."event_stories" USING btree (events_id);

CREATE UNIQUE INDEX "PK_event_stories" ON public."event_stories" USING btree (id);

CREATE INDEX "IXFK_industries_industry_districts" ON public."industries" USING btree (industry_districts_id);

CREATE UNIQUE INDEX "PK_industries" ON public."industries" USING btree (id);

CREATE INDEX "IXFK_industry_districts_district s" ON public."industry_districts" USING btree (district s_id);

CREATE UNIQUE INDEX "PK_industry_districts" ON public."industry_districts" USING btree (id);

CREATE UNIQUE INDEX "PK_district s" ON public."district s" USING btree (id);

CREATE INDEX "IXFK_subsectors_industries" ON public."subsectors" USING btree (industries_id);

CREATE UNIQUE INDEX "PK_subsectors" ON public."subsectors" USING btree (id);

I am using postgres version 14.0

How do I improve the performance of this query?

Answers

You could get rid of the subselects and allow PostgreSQL to use its internal optimizations while joining:

SELECT s.id, s.description, count(DISTINCT et.id)
  FROM "district s" s
  INNER JOIN "industry_districts" id2
    ON id2.id = s.id
  INNER JOIN "industries" i2
    ON i2.industry_districts_id = id2.id
  INNER JOIN "subsectors"  si2
    ON si2.industries_id = i2.id
  INNER JOIN "supplys" rs2
    ON rs2.subsectors_id = si2.id
  INNER JOIN "event_relations" ers2
    ON ers2.supplys_id = rs2.id
  LEFT JOIN "event_stories" et
    ON et.events_id = ers2.events_id
  GROUP BY s.id
  LIMIT 10 OFFSET 0;

I have no contextual knowledge about your query, so excuse me if I have made a typo.

- JohnH
- July 9, 2023 at 2:24 pm
- 0 votes
0
The following equivalent query should run much faster:
```
SELECT s.id, s.description, count(DISTINCT et.id)
  FROM (SELECT id, description
          FROM "district s"
          GROUP BY id
          LIMIT 10 OFFSET 0) s
  INNER JOIN "industry_districts" id2
    ON id2.id = s.id
  INNER JOIN "industries" i2
    ON i2.industry_districts_id = id2.id
  INNER JOIN "subsectors"  si2
    ON si2.industries_id = i2.id
  INNER JOIN "supplys" rs2
    ON rs2.subsectors_id = si2.id
  INNER JOIN "event_relations" ers2
    ON ers2.supplys_id = rs2.id
  LEFT JOIN "event_stories" et
    ON et.events_id = ers2.events_id;
```
The returned rows are indeterminate in this query, and the one from the original post, because they lack an ORDER BY clause before the LIMIT and OFFSET.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Postgres query involving joins and count too slow

Answers