skip to Main Content

When I ran this sql in MySQL, it didn’t get any error. However, entity_id doesn’t exist in table b.

select
  entity_id
from
  a
where
  entity_type = 2
  and is_deleted = 0
  and entity_id in (
    select
      strategy_id
    from
      b
    where 
      strategy_type = 1 
      and entity_id != 2
)

When I ran the subquery separately, it reported Unknown column 'entity_id' in 'where clause', which is as expected.

In addition, I changed entity_id to a messed up name(for example, xaasaz) in the subquery. And when I rerun the entire SQL, it reported the Unknown column error immediately.

MySQL version is 5.7.23-23-log

Why did this happen? Thank you for your help.

3

Answers


  1. There is an alias/table reference trick at work here. Here is your query repeated:

    SELECT entity_id
    FROM a
    WHERE
        entity_type = 2 AND
        is_deleted = 0 AND
        entity_id IN (
            SELECT strategy_id
            FROM b
            WHERE strategy_type = 1 AND
                  entity_id != 2
    );
    

    What appears to be happening in the subquery is that MySQL, failing to find any column in the b table called entity_id, is looking in the a table.

    That is to say, it appears that your query is actually being evaluated as:

    SELECT entity_id
    FROM a
    WHERE
        entity_type = 2 AND
        is_deleted = 0 AND
        entity_id IN (
            SELECT strategy_id
            FROM b
            WHERE strategy_type = 1 AND
                  a.entity_id != 2        -- note the alias here
    );
    

    Note carefully that a.entity_id appears in the subquery. When you run the subquery by itself, MySQL can only possibly reference the entity_id column to the b table, which of course fails.

    Login or Signup to reply.
  2. This behavior is due to MySQL’s handling of the outer and inner query column references, which can sometimes lead to confusing results. In MySQL, if a column specified in the subquery isn’t found within that subquery’s table (b in your case), MySQL will attempt to resolve it by looking up the outer query’s table (a). This phenomenon is known as name resolution, and it may produce misleading or unexpected results.

    Here’s a breakdown of what’s happening in your query:

    1. In the subquery:

    select strategy_id from b where strategy_type = 1 and entity_id != 2

    Since entity_id is not a column in table b, MySQL tries to resolve entity_id by looking in the outer query’s table a. If entity_id exists in table a, it implicitly refers to a.entity_id instead. MySQL doesn’t throw an error because it assumes entity_id is from the outer scope.

    1. When you run the subquery independently:

    MySQL throws an error because, without the outer query context, it can’t resolve entity_id in table b, resulting in the "Unknown column" error as expected.

    1. Changing entity_id in the subquery to a nonsensical name (like
      xaasaz) results in an immediate error. This happens because MySQL
      can’t resolve xaasaz in either the inner query or outer query
      tables, and it throws an error instead of assuming cross-scope name
      resolution.

    To avoid these issues, make sure column names in subqueries are either correctly scoped or explicitly qualified. This can prevent confusion and make your SQL more readable and reliable.

    Here’s how you might fix your query to prevent MySQL from misinterpreting columns:

    SELECT entity_id
    FROM a
    WHERE entity_type = 2
      AND is_deleted = 0
      AND entity_id IN (
        SELECT strategy_id
        FROM b
        WHERE strategy_type = 1
          AND b.entity_id != 2  -- Qualify 'entity_id' to avoid outer scope resolution
      );
    

    This approach ensures that each table is correctly referenced, eliminating potential misinterpretation of column names across different query scopes.

    Login or Signup to reply.
  3. When the server processes the query, it collects all columns which are visible in the current context, current point of the query. All names are expanded up to complete names, i.e. database.table.column for tables and views, table.column for subqueries, column for output list columns and joining columns which are used in USING joining condition or in NATURAL joining, and name for local variables.

    When the server meets with partially specified name then it searches for it in the columns list which are available/visible in the current code point. Firstly the name is searched in the local variables names, and when the mathcing value is found then it is used unconditionally, otherwise the columns names are scanned. If only one matched column name is found then it is used, if more the one matched name is found then ‘ambiguous name’ error is generated, and when no matching name is found then ‘unknown column’ eror is generated.


    In your particular case the partial column name entity_id cannot be found in the columns list of database.b table but it is present in the columns list of the database.a table which is visible in current code point. This is the only matching name – so it is used.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search