I’m having results I don’t understand from a MySQL query that should be pretty simple.
The goal I want to achieve is more complex, but I have reduced the problem to this: When I am using NOT IN subquery in a WHERE clause, it even removes lines that are not included in the subquery. In fact I get an empty result.
Here are 3 simple queries that show my problem:
SELECT DISTINCT tas.parent_id
FROM tasks tas
WHERE tas.deleted = 0
This one returns 10770 ids
SELECT DISTINCT tas.parent_id
FROM tasks tas
WHERE tas.status = 'Not Started'
AND tas.deleted = 0
This one returns 2028 ids
I was expecting the last one to return 10770 – 2028 = 8742 ids
SELECT DISTINCT tas.parent_id
FROM tasks tas
WHERE tas.parent_id NOT IN
(
SELECT DISTINCT tas2.parent_id
FROM tasks tas2
WHERE tas2.status = 'Not Started'
AND tas2.deleted = 0
)
AND tas.deleted = 0
But it returns an empty result.
I feel stupid because there is probably a silly mistake somewhere, and I don’t understand what it is.
Please note that I KNOW is twisted, I could have added tas.status != 'Not Started'
instead of relying on a subquery. But it is part of a more complex query, I only show the part that doesn’t make sense to me
Thank you for your help. Also this is my first time asking a question on Stack overflow so let me know if there are things I should change and keep in mind next time.
2
Answers
It looks like your
query is correct and should return the expected result
. However, there could be a few reasons why you’re getting an empty result:The subquery is returning all the parent_id values in the table. In
that case, your main query will return an empty result because there
are no parent_id values that are not in the subquery. You can check
this by running the subquery on its own and comparing the result to
the full list of parent_id values.
There could be NULL values in the parent_id column. When using the
NOT IN operator with a subquery, it’s important to check for NULL
values explicitly. For example, you can add
OR tas.parent_id IS NULL
to your query to include any NULL values in the result.As you mention that the query is part of a more complex query, are you sure you simplified it correctly here?
Are you sure you don’t have more conditions in the query with subselect?
I tested your scenario with the following setup:
Now, your query:
gets you
NULL
,1
,2
,3
,4
.The second query:
gets you
1
,2
,4
.Then your final query:
gives you
3
. This works as expected.The
NULL
handling in the example above is a special scenario (one might expect it in the result of the last query butNULL NOT IN
is always false).