I have the following SQL statement:
SELECT DISTINCT e.eventid,
e.objectid,
e.clock,
e.ns,
e.name,
e.severity
FROM events e,
functions f,
items i,
hosts_groups hg
WHERE e.source = '0'
AND e.object = '0'
AND NOT EXISTS (SELECT NULL
FROM functions f,
items i,
hosts_groups hgg
LEFT JOIN rights r
ON r.id = hgg.groupid
AND r.groupid IN ( 13, 95, 129, 498,
853, 1154, 1279, 1429
)
WHERE e.objectid = f.triggerid
AND f.itemid = i.itemid
AND i.hostid = hgg.hostid
GROUP BY i.hostid
HAVING Max(permission) < 2
OR Min(permission) IS NULL
OR Min(permission) = 0)
AND e.objectid = f.triggerid
AND f.itemid = i.itemid
AND i.hostid = hg.hostid
AND hg.groupid IN ( 101, 102, 191, 195,
198, 199, 200, 203,
206, 320, 324, 402,
403, 405, 406, 410,
411, 414, 415, 416,
417, 420, 421, 422,
423, 425, 426, 427,
432, 434, 435, 436,
437, 438, 441, 503,
504, 571, 1230, 1390,
1391, 1534, 1840, 1841, 2925 )
AND e.value = 1
ORDER BY e.eventid DESC
LIMIT 501;
Execution plan is:
Limit (cost=176751661.81..176751670.58 rows=501 width=86) (actual time=940324.730..940347.647 rows=501 loops=1)
Buffers: shared hit=850499789, temp read=19995 written=21317
-> Unique (cost=176751661.81..176760246.35 rows=490545 width=86) (actual time=940324.726..940347.030 rows=501 loops=1)
Buffers: shared hit=850499789, temp read=19995 written=21317
-> Sort (cost=176751661.81..176752888.18 rows=490545 width=86) (actual time=940324.723..940345.468 rows=1501 loops=1)
Sort Key: e.eventid DESC, e.objectid, e.clock, e.ns, e.name, e.severity
Sort Method: external merge Disk: 17392kB
Buffers: shared hit=850499789, temp read=19995 written=21317
-> Hash Join (cost=216403.41..176691546.09 rows=490545 width=86) (actual time=5736.438..940072.099 rows=168010 loops=1)
Hash Cond: (e.objectid = f.triggerid)
Buffers: shared hit=850499780, temp read=19139 written=19139
-> Index Scan using events_1 on events e (cost=0.56..176432359.99 rows=1177280 width=86) (actual time=1.918..934059.394 rows=2360086 loops=1)
Index Cond: ((source = 0) AND (object = 0))
Filter: ((value = 1) AND (NOT (SubPlan 1)))
Rows Removed by Filter: 2034965
Buffers: shared hit=849942562
SubPlan 1
-> HashAggregate (cost=40.78..41.89 rows=25 width=40) (actual time=0.390..0.390 rows=0 loops=2360086)
Group Key: i_1.hostid
Filter: ((max(r.permission) < 2) OR (min(r.permission) IS NULL) OR (min(r.permission) = 0))
Batches: 1 Memory Usage: 24kB
Rows Removed by Filter: 1
Buffers: shared hit=845651503
-> Nested Loop Left Join (cost=1.57..40.04 rows=74 width=12) (actual time=0.019..0.354 rows=43 loops=2360086)
Buffers: shared hit=845651503
-> Nested Loop (cost=1.28..14.16 rows=16 width=16) (actual time=0.013..0.052 rows=12 loops=2360086)
Buffers: shared hit=48616416
-> Nested Loop (cost=0.86..12.49 rows=3 width=8) (actual time=0.008..0.017 rows=2 loops=2360086)
Buffers: shared hit=27730393
-> Index Scan using functions_1 on functions f_1 (cost=0.43..4.55 rows=3 width=8) (actual time=0.003..0.005 rows=2 loops=2360086)
Index Cond: (triggerid = e.objectid)
Buffers: shared hit=9551365
-> Index Scan using items_pkey on items i_1 (cost=0.43..2.65 rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=4544491)
Index Cond: (itemid = f_1.itemid)
Buffers: shared hit=18178425
-> Index Only Scan using hosts_groups_1 on hosts_groups hgg (cost=0.42..0.51 rows=5 width=16) (actual time=0.003..0.008 rows=6 loops=4544491)
Index Cond: (hostid = i_1.hostid)
Heap Fetches: 9037804
Buffers: shared hit=20886023
-> Index Scan using rights_2 on rights r (cost=0.29..1.57 rows=5 width=12) (actual time=0.005..0.019 rows=3 loops=28604753)
Index Cond: (id = hgg.groupid)
Filter: (groupid = ANY ('{13,95,129,498,853,1154,1279,1429}'::bigint[]))
Rows Removed by Filter: 29
Buffers: shared hit=797035087
-> Hash (cost=211065.22..211065.22 rows=325330 width=8) (actual time=2247.572..2267.285 rows=614462 loops=1)
Buckets: 262144 Batches: 4 Memory Usage: 8053kB
Buffers: shared hit=557218, temp written=1575
-> Gather (cost=135136.65..211065.22 rows=325330 width=8) (actual time=771.135..1781.991 rows=614462 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=557218
-> Parallel Hash Join (cost=134136.65..177532.22 rows=81332 width=8) (actual time=736.027..1714.938 rows=122892 loops=5)
Hash Cond: (f.itemid = i.itemid)
Buffers: shared hit=557218
-> Parallel Seq Scan on functions f (cost=0.00..40599.51 rows=675652 width=16) (actual time=0.028..401.843 rows=535373 loops=5)
Buffers: shared hit=33843
-> Parallel Hash (cost=132475.65..132475.65 rows=132880 width=8) (actual time=731.952..731.986 rows=107386 loops=5)
Buckets: 1048576 (originally 524288) Batches: 1 (originally 1) Memory Usage: 33376kB
Buffers: shared hit=523175
-> Nested Loop (cost=0.97..132475.65 rows=132880 width=8) (actual time=1.923..551.210 rows=107386 loops=5)
Buffers: shared hit=523175
-> Parallel Index Only Scan using hosts_groups_1 on hosts_groups hg (cost=0.42..3481.74 rows=791 width=8) (actual time=1.742..16.262 rows=464 loops=5)
Filter: (groupid = ANY ('{101,102,191,195,198,199,200,203,206,320,324,402,403,405,406,410,411,414,415,416,417,420,421,422,423,425,426,427,432,434,435,436,437,438,441,503,504,571,1230,1390,1391,1534,1840,1841,2925}'::bigint[]))
Rows Removed by Filter: 20657
Heap Fetches: 29978
Buffers: shared hit=23995
-> Index Scan using items_9 on items i (cost=0.56..159.99 rows=309 width=16) (actual time=0.023..0.850 rows=232 loops=2318)
Index Cond: (hostid = hg.hostid)
Buffers: shared hit=499180
Planning:
Buffers: shared hit=196
Planning Time: 1.830 ms
Execution Time: 940388.537 ms
(73 rows)
https://explain.depesz.com/s/dOQR9
The problem stands out on scanning the index events_1.
The table and the index definitions are:
zabbix=# d events
Table "public.events"
Column | Type | Collation | Nullable | Default
--------------+-------------------------+-----------+----------+-----------------------
eventid | bigint | | not null |
source | integer | | not null | 0
object | integer | | not null | 0
objectid | bigint | | not null | '0'::bigint
clock | integer | | not null | 0
value | integer | | not null | 0
acknowledged | integer | | not null | 0
ns | integer | | not null | 0
name | character varying(2048) | | not null | ''::character varying
severity | integer | | not null | 0
Indexes:
"events_pkey" PRIMARY KEY, btree (eventid)
"events_1" btree (source, object, objectid, clock)
"events_2" btree (source, object, clock)
The statistics are up to date and correct. Also, statistics targets are also correct and enough(cross – checking pg_stats values for columns and getting real counts from table nearly the same).
Yet, even though the statistics are fine the estimation on events_1 is not correct, it is underestimating(I guess due to subplan, not exist part).
How can I convert this not exist
to left join with is null
? Also, is there any other way to optimize it?
Tried to set enable_indexscan = off;
, but it was worse.
Do I simply hit the limits?
Thanks!
3
Answers
The quick win would be to create this index:
If
permission
is a column ofrights
, add it to the index in anINCLUDE
clause.The big win would be if you could somehow rewrite the
NOT EXISTS
subquery so that it doesn’t useGROUP BY
.Assuming I correctly understood the posted query and haven’t made any typos, the following query should return the same results as the original:
The main performance hit in the original query is the need to execute the
EXISTS
operation’s subquery for each event row. TheNOT EXISTS
condition in the original query is logically equivalent to testing for the existence of at least one matching row inrights
wherepermission >= 2
. The join criteria in the original subquery are nearly identical to those of the outer query, except thathosts_groups
is not filtered in the subquery. This version reduces the redunant table references and eliminates the need to perform per-row subqueries.I haven’t run the query (since I don’t have useful sample data to test with), so I await your feedback as to whether it performs better than the original query and returns the correct result set.
To convert the NOT EXISTS to a left join, I think you can just remove the f.triggerid test from the WHERE and add f.triggerid as a column into the GROUP BY and the select-list.
And then do a left join to this subquery
ON e.objectid = subquery.triggerid
and then in outer WHERE test that subquery.triggerid is NULL. This will compute all the aggregates for all of the triggerid in bulk rather than doing each triggerid piecemeal (which might or might not actually be faster, we can’t tell just from the info shown)