I’m trying to figure out how exactly ON
and WHERE
work when doing a JOIN
My understanding based on other questions is that I should filter in where
when the condition is on the left side of the join, and in the on
clause when filtering on the right side
select * from A left join B
ON A.something=B.something AND {some_condition_on_B}
WHERE {some_condition_on_A}
I have encountered an issue with a partitioned A table
this select is very slow :
select * from
A
left join
B ON A.something=B.something
WHERE {A.foo = bar}
while this makes the query faster:
select * from
A
left join B ON A.something=B.something AND {A.foo = bar}
WHERE {A.foo = bar}
My question is how come the second select is faster ? Is it something I’m missing, because it seems that it shouldn’t make a difference
2
Answers
This joins the tables together and anything which matches A.something = B.something will join. Anything else will have a bunch of null values (more data to be transfered before the final filter)
This one joins the tables together, but will result in more null records if nothing was found which matches the condition {A.foo = bar} as well as A.Something = B.something. So less data is being transferred around before we get to the Where condition.
Both at the end are checking if they are equal (a null will never equals what it is being compared against, even another null)
Don’t ask people to explain, ask your PostgreSQL to
explain
: demoQuery plans heavily depend on the profile of your dataset, your schema and server config, traffic on the table, all of which can change in time and between versions. Anything anyone can tell you about why these queries behave the way they do on your setup based on just the statement will be a pretty useless blind guess.
If you instead ask your PostgreSQL to
explain analyze
, it’ll tell you exactly what it did, how and why, based on a very detailed knowledge of everything it depends on.