I have 2 tables in Postgres, one is of about 200M records, and the other a few hundreds.
I want to get 10 values from the small table and find the corresponding values in the long table. both are indexed. I’ve tried different methods and only the static one got me in no time to query:
1.
with s as (select * from shorttable limit 10)
select * from longtable where col1 in (select * from s)
select * from longtable where col1 in (select * from shorttable limit 10).
with s as (select * from shorttable limit 10)
select * from longtable join shorttable on longtable.col1=s.col1
The only good solution I got from querying the short table for the 10 values and then
specifically printing on the following query:
select * from longtable where col1 in ('aaa','bbb',.....)
Is that the only way for getting a fast response?
2
Answers
Your attempt 3. is the way to go. You just didn’t actually use the CTE in the outer
SELECT
. You introduced a second instance ofshorttable
by mistake. Fixed:The CTE is optional in this case. Can be simplified to:
The
USING
clause in combination withSELECT *
preventscol1
from being output twice. Usingl.*
instead of just*
achieves the same. So this detail is fixed twice here.And I wouldn’t use
IN
here. It’s not a true equivalent of a join, as it effectively removes duplicates from theIN
list, which typically just wasted effort, or not intended, or hides breakage. Exceptions apply.Further optimization depends on actual value distribution (and corresponding columns statistics) and actual indexes.
Your third attempt is the most feasible and straight forward attempt. A simple left join should surmise in this case. Making use of the common table expression CTE to limit the records from the short table to only the 10 records before left joining on the big table like this
Using left join, the result will include all 10 records from the short_table with the corresponding data or null from the long_table where they match on the common column. While join, (inner join) will only return data from both tables where they both match on the common column.i.e where there is no corresponding data for any record in the long_table, that record or row is not included in the result.