I was actually trying some thing similar with a sub query, and couldn’t understand what was going wrong. I have managed to simplify the question to the following.
I have a simple table which may include a NULL
in one of the columns:
DROP TABLE IF EXISTS data;
CREATE TEMP TABLE data (
id INT PRIMARY KEY,
number INT,
string TEXT
);
INSERT INTO data(id,number,string)
VALUES (1,1,'Apple'),(2,1,'Accordion'),(3,2,'Banana'),(4,2,'Banjo'),(5,NULL,'Whatever');
SELECT * FROM data WHERE number IN(1,2,NULL);
SELECT * FROM data WHERE number NOT IN(1,2,NULL);
There is a live version at https://dbfiddle.uk/KhTzbX_E .
When I look for rows matching the number
column:
SELECT * FROM data WHERE number IN(1,2,NULL);
I get a few results, as expected. This doesn’t include the row where number
is NULL
, but I suppose that the IN
expression is short for WHERE a = b
.
If I look for the non-matches:
SELECT * FROM data WHERE number NOT IN(1,2,NULL);
I get nothing at all.
I can’t see how that can be right. The expression IN(1,2,NULL)
must return a valid list otherwise the first one wouldn’t work.
What is going on here, and is there a correct way to do this?
Note: I know it’s silly to put in the NULL
, but the idea is that the list is supposed to be a sub query which might return a vew NULL
s. I also know that I can filter out the NULL
s in the sub query. However that looks like a workaround to me.
I have tried this in PostgreSQL, MariaDB and Microsoft SQL Server.
4
Answers
Any equality check involving NULL will return null, and null does not equal null. Null needs special language using
is/is not
My recommendation would be to add another part to your where clause to make it work, and also keep it readable:
Try playing around with null and you’ll see how it behaves:
IN()
is a shortcut.number IN(1,2,NULL)
equates to:(number = 1 or number = 2 or number = NULL)
you cannot use equality against NULL you must use
IS NULL
instead, hence the IN shortcut won’t work as you expected with with NULLs.As you suggested, it is recommended that you strip out any NULLs if using a subquery.
Basically,
NOT IN
is of very limited use if anynull
values are involved. This isn’t exactly a new insight. See:translates to:
nr = null
is alwaysnull
, and sincefalse OR null
results innull
, the expression can never becometrue
. Used in theWHERE
clause, such a filter can never return any rows.This is an old trap for people who are not experienced with NULL: if a
NOT IN
list contains NULL, the result set is always empty.To understand why, let’s rewrite
to the semantically equivalent
Then note that
number <> NULL
will always return NULL. This is easiest to understand if you think of “NULL” as “unknown”: when asked if an unknown number is different from any given number, the answer could be “true” or “false”, depending on the unknown value of the unknown number. So the answer must be “unknown”, which is theboolean
value NULL.Now
something AND NULL
can beFALSE
(ifsomething
isFALSE
) or NULL (ifsomething
is NULL orTRUE
), but it never can beTRUE
. And aWHERE
condition only passes rows where the condition isTRUE
. BothFALSE
and NULL won’t pass.