This query below takes way too long to execute
select s.id, s.description, count(distinct et.id) from
"district s" s
inner join
(
select id.id from "industry_districts" id
) as id2 on id2.id = s.id
inner join
(
select id, industry_districts_id from "industries" i
) as i2 on i2.industry_districts_id = id2.id
inner join
(
select id, industries_id from "subsectors" si
) as si2 on si2.industries_id = i2.id
inner join
(
select id, subsectors_id from "supplys" rs
) as rs2 on rs2.subsectors_id = si2.id
inner join
(
select id, supplys_id, events_id from "event_relations" ers
) as ers2 on ers2.supplys_id = rs2.id
left join (
select id, events_id from "event_stories"
) et on et.events_id = ers2.events_id
group by s.id
limit 10 offset 0
**Explain analyze gives the following result **
Limit (cost=0.99..20719301.15 rows=10 width=130) (actual time=2107.465..214891.582 rows=10 loops=1)
-> GroupAggregate (cost=0.99..24863161.18 rows=12 width=130) (actual time=2107.462..214891.567 rows=10 loops=1)
Group Key: s.id
-> Nested Loop Left Join (cost=0.99..23466091.79 rows=279413855 width=126) (actual time=2.073..141429.178 rows=104217089 loops=1)
-> Nested Loop (cost=0.57..607232.85 rows=11644034 width=126) (actual time=2.055..62770.206 rows=8174966 loops=1)
-> Nested Loop (cost=0.14..3598.24 rows=1389 width=126) (actual time=2.009..56.815 rows=927 loops=1)
Join Filter: (si.id = rs.subindustries_id)
Rows Removed by Join Filter: 226384
-> Nested Loop (cost=0.14..117.68 rows=77 width=126) (actual time=1.004..2.502 rows=80 loops=1)
Join Filter: (i.id = si.industries_id)
Rows Removed by Join Filter: 5309
-> Nested Loop (cost=0.14..32.60 rows=34 width=126) (actual time=0.513..0.775 rows=34 loops=1)
Join Filter: (ig.id = i.industry_groups_id)
Rows Removed by Join Filter: 671
-> Nested Loop (cost=0.14..18.13 rows=12 width=126) (actual time=0.443..0.507 rows=11 loops=1)
Join Filter: (s.id = ig.id)
Rows Removed by Join Filter: 126
-> Index Scan using "PK_Sectors" on "Sectors" s (cost=0.14..12.31 rows=12 width=122) (actual time=0.402..0.417 rows=11 loops=1)
-> Materialize (cost=0.00..1.38 rows=25 width=4) (actual time=0.002..0.004 rows=12 loops=11)
-> Seq Scan on "Industry_Groups" ig (cost=0.00..1.25 rows=25 width=4) (actual time=0.011..0.019 rows=25 loops=1)
-> Materialize (cost=0.00..2.05 rows=70 width=8) (actual time=0.001..0.013 rows=64 loops=11)
-> Seq Scan on "Industries" i (cost=0.00..1.70 rows=70 width=8) (actual time=0.007..0.023 rows=70 loops=1)
-> Materialize (cost=0.00..4.38 rows=159 width=8) (actual time=0.010..0.028 rows=158 loops=34)
-> Seq Scan on "Subindustries" si (cost=0.00..3.59 rows=159 width=8) (actual time=0.345..0.386 rows=159 loops=1)
-> Materialize (cost=0.00..174.03 rows=2869 width=8) (actual time=0.007..0.307 rows=2841 loops=80)
-> Seq Scan on "Russell_3000_Stocks" rs (cost=0.00..159.69 rows=2869 width=8) (actual time=0.578..4.098 rows=2869 loops=1)
-> Index Scan using "IXFK_Event_Related_Stocks_Russell_3000_Stocks" on "Event_Related_Stocks" ers (cost=0.44..350.07 rows=8451 width=8) (actual time=0.103..65.007 rows=8819 loops=927)
Index Cond: (russell_3000_stocks_id = rs.id)
-> Index Scan using "IXFK_Event_Tweets_Katalyst_Events" on "Event_Tweets" (cost=0.42..1.40 rows=56 width=8) (actual time=0.003..0.006 rows=12 loops=8174966)
Index Cond: (katalyst_events_id = ers.katalyst_events_id)
Planning Time: 16.839 ms
Execution Time: 214891.868 ms
Indexes created
CREATE INDEX "IXFK_event_relations_events" ON public."event_relations" USING btree (events_id);
CREATE INDEX "IXFK_event_relations_supplys" ON public."event_relations" USING btree (supplys_id);
CREATE UNIQUE INDEX "PK_event_relations" ON public."event_relations" USING btree (id);
CREATE INDEX "IXFK_event_stories_events" ON public."event_stories" USING btree (events_id);
CREATE UNIQUE INDEX "PK_event_stories" ON public."event_stories" USING btree (id);
CREATE INDEX "IXFK_industries_industry_districts" ON public."industries" USING btree (industry_districts_id);
CREATE UNIQUE INDEX "PK_industries" ON public."industries" USING btree (id);
CREATE INDEX "IXFK_industry_districts_district s" ON public."industry_districts" USING btree (district s_id);
CREATE UNIQUE INDEX "PK_industry_districts" ON public."industry_districts" USING btree (id);
CREATE UNIQUE INDEX "PK_district s" ON public."district s" USING btree (id);
CREATE INDEX "IXFK_subsectors_industries" ON public."subsectors" USING btree (industries_id);
CREATE UNIQUE INDEX "PK_subsectors" ON public."subsectors" USING btree (id);
I am using postgres version 14.0
How do I improve the performance of this query?
2
Answers
You could get rid of the subselects and allow PostgreSQL to use its internal optimizations while joining:
I have no contextual knowledge about your query, so excuse me if I have made a typo.
The following equivalent query should run much faster:
The returned rows are indeterminate in this query, and the one from the original post, because they lack an
ORDER BY
clause before theLIMIT
andOFFSET
.