skip to Main Content

I have 2 tables in Postgres, one is of about 200M records, and the other a few hundreds.
I want to get 10 values from the small table and find the corresponding values in the long table. both are indexed. I’ve tried different methods and only the static one got me in no time to query:

1.

with s as (select * from shorttable limit 10)
select * from longtable where col1 in (select * from s)
select * from longtable where col1 in (select * from shorttable limit 10).
with s as (select * from shorttable limit 10) 
select * from longtable join shorttable on longtable.col1=s.col1

The only good solution I got from querying the short table for the 10 values and then
specifically printing on the following query:

select * from longtable where col1 in ('aaa','bbb',.....)

Is that the only way for getting a fast response?

2

Answers


  1. Your attempt 3. is the way to go. You just didn’t actually use the CTE in the outer SELECT. You introduced a second instance of shorttable by mistake. Fixed:

    WITH s AS (SELECT col1 FROM shorttable LIMIT 10)  -- SELECT * made no sense here
    SELECT *
    FROM   longtable l
    JOIN   s USING (col1)  -- not "shorttable"!
    

    The CTE is optional in this case. Can be simplified to:

    SELECT l.*
    FROM   (SELECT col1 FROM shorttable LIMIT 10) s
    JOIN   longtable l USING (col1);
    

    The USING clause in combination with SELECT * prevents col1 from being output twice. Using l.* instead of just * achieves the same. So this detail is fixed twice here.

    And I wouldn’t use IN here. It’s not a true equivalent of a join, as it effectively removes duplicates from the IN list, which typically just wasted effort, or not intended, or hides breakage. Exceptions apply.

    Further optimization depends on actual value distribution (and corresponding columns statistics) and actual indexes.

    Login or Signup to reply.
  2. Your third attempt is the most feasible and straight forward attempt. A simple left join should surmise in this case. Making use of the common table expression CTE to limit the records from the short table to only the 10 records before left joining on the big table like this

    WITH short_records AS (
       SELECT *
       FROM *short_table*
       LIMIT 10
    )
    SELECT *
    FROM short_records
    LEFT JOIN long_table ON short_records.common_column = long_table.common_column
    

    Using left join, the result will include all 10 records from the short_table with the corresponding data or null from the long_table where they match on the common column. While join, (inner join) will only return data from both tables where they both match on the common column.i.e where there is no corresponding data for any record in the long_table, that record or row is not included in the result.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search