I have a query like:
SELECT
DISTINCT ON (services.group_id)
services.service_id,
...
FROM services
WHERE services.group_id IN ('67240181b97477f054f9b1bc', '67240181b97477f054f9b1be')
I have an index on group_id
.
Why does Postgres need to do an extra heap scan? I just want any 1st match with a matching group_id
. Shouldn’t it already have the correct rows from the Bitmap Index Scan
which checks for matching group_id
?
Unique (cost=609.73..611.66 rows=50 width=240) (actual time=0.353..0.385 rows=2 loops=1)
-> Sort (cost=609.73..610.70 rows=386 width=240) (actual time=0.353..0.364 rows=386 loops=1)
Sort Key: group_id
Sort Method: quicksort Memory: 133kB
-> Bitmap Heap Scan on services (cost=11.56..593.15 rows=386 width=240) (actual time=0.030..0.180 rows=386 loops=1)
Recheck Cond: ((group_id)::text = ANY ('{67240181b97477f054f9b1bc,67240181b97477f054f9b1be}'::text[]))
Heap Blocks: exact=39
-> Bitmap Index Scan on ix_services_group_id (cost=0.00..11.46 rows=386 width=0) (actual time=0.019..0.020 rows=386 loops=1)
Index Cond: ((group_id)::text = ANY ('{67240181b97477f054f9b1bc,67240181b97477f054f9b1be}'::text[]))
Planning Time: 0.505 ms
Execution Time: 0.434 ms
2
Answers
the IN clause works like a OR and that is the cuase for the heap
Use
The
Bitmap Heap Scan
is just an integral step of a bitmap index scan. After building a bitmap for (estimated many) qualifying tuples, Postgres accesses each harboring data page once.But your query is inefficient to begin with. For more than a few hits per
group_id
(avg. 193 in the example) this should be much faster:Assuming
group_id
is typetext
. Else adapt the input accordingly.With this query, Postgres will fetch the first live row from the index. Unless you can make an index-only scan work with a covering index (seems unlikely), a plain index scan (not a bitmap index scan like in your query) still goes to the heap to satisfy the
SELECT
list – but only once per input value (that finds any match). Ignoring possible HOT chains to keep it simple)Besides the more expensive bitmap index scan, this also avoids removing all the duplicates fetched in vain.
Your index
ix_services_group_id
is good for it. If you addORDER BY
criteria, ideally have a matching multicolumn index.See:
Note a subtle difference: This query (unlike your original) does not fold duplicates in the input array (input list in your case). Make sure not to pass duplicates or remove those before the
LATERAL
subquery.