I have 2 tables. t1.uuid
has 630,000 distinct values. t2.uuid
has 300,000 distinct values.
When I run
SELECT
t1.uuid
, t2.uuid
FROM
t1 --630,000 uuids
LEFT OUTER JOIN t2 -- 300,000 uuids
ON t1.uuid = t2.uuid
WHERE
t2.uuid IS NULL
There are no results.
Edit1:
For clarification:
t1 looks like:
uuid |
---|
ufo123 |
abc456 |
def789 |
t2 looks like
uuid |
---|
ufo123 |
def789 |
Every record in t2 has a match in t1. Not every record in t1 has a match in t2 (as indicated by the size of the tables.
I expect the join to result in:
t1.uuid | t2.uuid |
---|---|
ufo123 | ufo123 |
abc456 | NULL |
def789 | def789 |
And the result of my query to by:
t1.uuid | t2.uuid |
---|---|
abc456 | NULL |
But instead I get no results. I’m not sure if there is a communication issue between Redshift and DBEAVER to produce this behavior.
Edit2:
I ran the following which resulted in only 300,000 records (same as only t2):
SELECT
COUNT(DISTINCT t1.uuid)
FROM
t1 --630,000 uuids
FULL JOIN t2 -- 300,000 uuids
ON t1.uuid = t2.uuid
This is completely wrong.
2
Answers
The problem is that you cannot check NULL with equal sign in Redshift. "NULL = NULL" is false.
You need to expand your JOIN ON clause (I left it general case which isn’t needed with you WHERE clause):
The problem is that when both t1 and t2 have NULL uuid these will all join to each other and could explode your results. So you may want to check this in a different way if there are a lot of NULLs.
As for an explanation let me address the "why" of this. This has to do with the Redshift inferring a WHERE clause from your SQL. It goes like this – if t2.uuid = X and t1.uuid = t2.uuid then Redshift should be able to apply the WHERE clause t1.uuid = X. But this produces the null set give the reasons above. Redshift does this to reduce the scanned data as much as possible.
Update:
Note that you are recreating the EXCEPT clause with this LEFT OUTER JOIN. You will likely get better results with EXCEPT.
Your logic looks fine. Testing on a sample data. 3 records in
table a
and 1 record intable b
. Output is 2 records that don’t match.Output
I’d recommend checking the data that it’s matching on. As Pelayo Martinez mentioned, it’s highly likely your LEFT table is T2 instead of T1. Only then it’ll be a no row result.