skip to Main Content

I’m having results I don’t understand from a MySQL query that should be pretty simple.
The goal I want to achieve is more complex, but I have reduced the problem to this: When I am using NOT IN subquery in a WHERE clause, it even removes lines that are not included in the subquery. In fact I get an empty result.

Here are 3 simple queries that show my problem:

SELECT DISTINCT tas.parent_id
FROM tasks tas
WHERE tas.deleted = 0

This one returns 10770 ids

SELECT DISTINCT tas.parent_id
FROM tasks tas
WHERE tas.status = 'Not Started'
AND tas.deleted = 0

This one returns 2028 ids

I was expecting the last one to return 10770 – 2028 = 8742 ids

SELECT DISTINCT tas.parent_id
FROM tasks tas
WHERE tas.parent_id NOT IN
(
SELECT DISTINCT tas2.parent_id
    FROM tasks tas2
    WHERE tas2.status = 'Not Started'
    AND tas2.deleted = 0
)
AND tas.deleted = 0

But it returns an empty result.

I feel stupid because there is probably a silly mistake somewhere, and I don’t understand what it is.

Please note that I KNOW is twisted, I could have added tas.status != 'Not Started' instead of relying on a subquery. But it is part of a more complex query, I only show the part that doesn’t make sense to me

Thank you for your help. Also this is my first time asking a question on Stack overflow so let me know if there are things I should change and keep in mind next time.

2

Answers


  1. It looks like your query is correct and should return the expected result. However, there could be a few reasons why you’re getting an empty result:

    • The subquery is returning all the parent_id values in the table. In
      that case, your main query will return an empty result because there
      are no parent_id values that are not in the subquery. You can check
      this by running the subquery on its own and comparing the result to
      the full list of parent_id values.

    • There could be NULL values in the parent_id column. When using the
      NOT IN operator with a subquery, it’s important to check for NULL
      values explicitly. For example, you can add OR tas.parent_id IS NULL to your query to include any NULL values in the result.

    Login or Signup to reply.
  2. As you mention that the query is part of a more complex query, are you sure you simplified it correctly here?

    Are you sure you don’t have more conditions in the query with subselect?

    I tested your scenario with the following setup:

    CREATE TABLE IF NOT EXISTS `tasks` (
      `id` int(6) unsigned NOT NULL,
      `parent_id` int(6) unsigned NULL,
      `deleted` int(3) unsigned NOT NULL,
      `status` varchar(200) NOT NULL,
      PRIMARY KEY (`id`)
    ) DEFAULT CHARSET=utf8;
    
    INSERT INTO `tasks` (`id`, `parent_id`, `deleted`, `status`) VALUES
      ('1', NULL, 0, 'In Progress'),
      ('2', '1',  1, 'Done'),
      ('3', '1', 1, 'Not Started'),
      ('4', '1', 0, 'Not Started'),
      ('5', '2', 0, 'Not Started'),
      ('6', '2', 0, 'Not Started'),
      ('7', '2', 0, 'Not Started'),
      ('8', '3', 0, 'Done'),
      ('9', '4', 0, 'Not Started');
    
    

    Now, your query:

    SELECT DISTINCT tas.parent_id
    FROM tasks tas
    WHERE tas.deleted = 0
    

    gets you NULL, 1, 2, 3, 4.

    The second query:

    SELECT DISTINCT tas.parent_id
    FROM tasks tas
    WHERE tas.status = 'Not Started'
    AND tas.deleted = 0
    

    gets you 1, 2, 4.

    Then your final query:

    SELECT DISTINCT tas.parent_id
    FROM tasks tas
    WHERE tas.parent_id NOT IN
    (
        SELECT DISTINCT tas2.parent_id
        FROM tasks tas2
        WHERE tas2.status = 'Not Started'
        AND tas2.deleted = 0
    )
    AND tas.deleted = 0
    

    gives you 3. This works as expected.

    The NULL handling in the example above is a special scenario (one might expect it in the result of the last query but NULL NOT IN is always false).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search