I find my self sometimes needing to select many items of different tables based on a list of id’s or other variables. Given that this list can contain thousands of elements, it is important for me that the filter does not become a n*m query. I have experienced that under some circumstances postgres makes fast lookups and sometimes not, for example if I filter by primary key. When it made slow filtering I typically solved it by creating an IN
query table and join it with the table to make the filter.
But now I’m back at the problem again and I really want to hear if somebody can help me build some intuition for when I should assume that lookup is used and when a hash table is used.
At my current work we are using postgres 15, but most of my experience is from postgres 10, so there might be some bad habits from older versions.
2
Answers
Why assume if you can
explain analyze verbose
. It’s hard to comment on whyIN
worked so well without seeing examples of what it replaced and how.Even having good enough intuition to trust it, you’re still better off if you trust, but verify. Server config, index setup, table size, traffic and how often it’s
vacuum analyze
d can all have impact on what the planner comes up with, and it can change version to version. All human intuition in this context is doomed to be unreliable and get quickly outdated.That being said, here are some 1M-row tests at db<>fiddle, here are some comparisons of
in
,any
,exists
,join
behaviour, here’s some more, more visual.Previously I was using
column = ANY (VALUES (...), ...)
at some point instead of making N batch requests withIN
because it was faster (basically it creates a temp table and then joins with it) but looks like in Postgres 15 it’s not a case and regularIN
even with 2000 values works 2x faster than value list filter. (But maybe at some N it’s still faster thatIN
)Having
sku_test
table setup asand with around 22k records
results in
and
results in
And I see similar results on production database with live table and index on filtering column.
But again, it’s always better to measure by yourself.