skip to Main Content

I’m trying to figure out how exactly ON and WHERE work when doing a JOIN

My understanding based on other questions is that I should filter in where when the condition is on the left side of the join, and in the on clause when filtering on the right side

select * from A left join B 
ON A.something=B.something AND {some_condition_on_B} 
WHERE {some_condition_on_A}

I have encountered an issue with a partitioned A table

this select is very slow :

select * from 
A 
left join 
B ON A.something=B.something 
WHERE {A.foo = bar}

while this makes the query faster:

select * from 
A 
left join B ON A.something=B.something AND {A.foo = bar}
WHERE {A.foo = bar}

My question is how come the second select is faster ? Is it something I’m missing, because it seems that it shouldn’t make a difference

2

Answers


  1. select * from 
    A 
    left join 
    B ON A.something=B.something 
    WHERE {A.foo = bar}
    

    This joins the tables together and anything which matches A.something = B.something will join. Anything else will have a bunch of null values (more data to be transfered before the final filter)

    select * from 
    A 
    left join B ON A.something=B.something AND {A.foo = bar}
    WHERE {A.foo = bar}
    

    This one joins the tables together, but will result in more null records if nothing was found which matches the condition {A.foo = bar} as well as A.Something = B.something. So less data is being transferred around before we get to the Where condition.

    Both at the end are checking if they are equal (a null will never equals what it is being compared against, even another null)

    Login or Signup to reply.
  2. Don’t ask people to explain, ask your PostgreSQL to explain: demo

    create table A(something text,foo text,some_condition_on_A boolean);
    
    create table B(something text,some_condition_on_B boolean);
    
    explain--why "this select is very slow"
    select * from 
    A 
    left join 
    B ON A.something=B.something 
    WHERE A.foo = 'bar'
    
    QUERY PLAN
    Hash Right Join (cost=20.93..49.48 rows=27 width=98)
      Hash Cond: (b.something = a.something)
      -> Seq Scan on b (cost=0.00..23.30 rows=1330 width=33)
      -> Hash (cost=20.88..20.88 rows=4 width=65)
            -> Seq Scan on a (cost=0.00..20.88 rows=4 width=65)
                  Filter: (foo = ‘bar’::text)
    explain--why "this makes the query faster"
    select * from 
    A 
    left join B ON A.something=B.something AND A.foo = 'bar'
    WHERE A.foo = 'bar'
    
    QUERY PLAN
    Hash Right Join (cost=20.93..49.55 rows=4 width=98)
      Hash Cond: (b.something = a.something)
      Join Filter: (a.foo = ‘bar’::text)
      -> Seq Scan on b (cost=0.00..23.30 rows=1330 width=33)
      -> Hash (cost=20.88..20.88 rows=4 width=65)
            -> Seq Scan on a (cost=0.00..20.88 rows=4 width=65)
                  Filter: (foo = ‘bar’::text)

    Query plans heavily depend on the profile of your dataset, your schema and server config, traffic on the table, all of which can change in time and between versions. Anything anyone can tell you about why these queries behave the way they do on your setup based on just the statement will be a pretty useless blind guess.

    If you instead ask your PostgreSQL to explain analyze, it’ll tell you exactly what it did, how and why, based on a very detailed knowledge of everything it depends on.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search