skip to Main Content

I was actually trying some thing similar with a sub query, and couldn’t understand what was going wrong. I have managed to simplify the question to the following.

I have a simple table which may include a NULL in one of the columns:

DROP TABLE IF EXISTS data;
CREATE TEMP TABLE data (
    id INT PRIMARY KEY,
    number INT,
    string TEXT
);
INSERT INTO data(id,number,string)
VALUES (1,1,'Apple'),(2,1,'Accordion'),(3,2,'Banana'),(4,2,'Banjo'),(5,NULL,'Whatever');
SELECT * FROM data WHERE number IN(1,2,NULL);
SELECT * FROM data WHERE number NOT IN(1,2,NULL);

There is a live version at https://dbfiddle.uk/KhTzbX_E .

When I look for rows matching the number column:

SELECT * FROM data WHERE number IN(1,2,NULL);

I get a few results, as expected. This doesn’t include the row where number is NULL, but I suppose that the IN expression is short for WHERE a = b.

If I look for the non-matches:

SELECT * FROM data WHERE number NOT IN(1,2,NULL);

I get nothing at all.

I can’t see how that can be right. The expression IN(1,2,NULL) must return a valid list otherwise the first one wouldn’t work.

What is going on here, and is there a correct way to do this?

Note: I know it’s silly to put in the NULL, but the idea is that the list is supposed to be a sub query which might return a vew NULLs. I also know that I can filter out the NULLs in the sub query. However that looks like a workaround to me.

I have tried this in PostgreSQL, MariaDB and Microsoft SQL Server.

4

Answers


  1. Any equality check involving NULL will return null, and null does not equal null. Null needs special language using is/is not

    My recommendation would be to add another part to your where clause to make it work, and also keep it readable:

    SELECT * FROM data WHERE (number NOT IN(1,2) and number is not null);
    SELECT * FROM data WHERE (number IN (1,2)  or number is null);
    

    Try playing around with null and you’ll see how it behaves:

    select case when null = null then 1 else 0 end as test
    
    Login or Signup to reply.
  2. IN() is a shortcut. number IN(1,2,NULL) equates to:

    (number = 1 or number = 2 or number = NULL)

    you cannot use equality against NULL you must use IS NULL instead, hence the IN shortcut won’t work as you expected with with NULLs.

    As you suggested, it is recommended that you strip out any NULLs if using a subquery.

    Login or Signup to reply.
  3. Basically, NOT IN is of very limited use if any null values are involved. This isn’t exactly a new insight. See:

    nr NOT IN (1,2,NULL)
    

    translates to:

    NOT (nr = 1 OR nr = 2 or nr = null)
    

    nr = null is always null, and since false OR null results in null, the expression can never become true. Used in the WHERE clause, such a filter can never return any rows.

    Login or Signup to reply.
  4. This is an old trap for people who are not experienced with NULL: if a NOT IN list contains NULL, the result set is always empty.

    To understand why, let’s rewrite

    WHERE number NOT IN (1, 2, NULL)
    

    to the semantically equivalent

    WHERE number <> 1 AND number <> 2 AND number <> NULL
    

    Then note that number <> NULL will always return NULL. This is easiest to understand if you think of “NULL” as “unknown”: when asked if an unknown number is different from any given number, the answer could be “true” or “false”, depending on the unknown value of the unknown number. So the answer must be “unknown”, which is the boolean value NULL.

    Now something AND NULL can be FALSE (if something is FALSE) or NULL (if something is NULL or TRUE), but it never can be TRUE. And a WHERE condition only passes rows where the condition is TRUE. Both FALSE and NULL won’t pass.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search