skip to Main Content

Question:
I have the following table and data setup:

CREATE TABLE EMPLOYEE (
  empId INTEGER AUTO_INCREMENT PRIMARY KEY,
  name TEXT NOT NULL,
  dept TEXT NOT NULL
);

INSERT INTO EMPLOYEE(name, dept) VALUES ('Clark', 'Sales');
INSERT INTO EMPLOYEE(name, dept) VALUES ('Dave', 'Accounting');
INSERT INTO EMPLOYEE(name, dept) VALUES ('Ava', 'Sales');

I am running this query:

SELECT * 
FROM EMPLOYEE AS a 
INNER JOIN EMPLOYEE b 
  ON a.empId = (SELECT MIN(b.empId));

Output:

+-------+-------+------------+-------+-------+------------+
| empId | name  | dept       | empId | name  | dept       |
+-------+-------+------------+-------+-------+------------+
|     1 | Clark | Sales      |     1 | Clark | Sales      |
|     2 | Dave  | Accounting |     2 | Dave  | Accounting |
|     3 | Ava   | Sales      |     3 | Ava   | Sales      |
+-------+-------+------------+-------+-------+------------+

I expect the query to return only the row with empId = 1, but instead, it returns all rows. Can someone explain why this is happening and how the INNER JOIN is evaluating the ON clause with the subquery?

What is the correct behavior here, and how can I fix this to only get the row with the smallest empId?

2

Answers


  1. An SQL join needs to directly link the columns of two tables, "a" and "b", and may then further limit the rows by adding additional filters e.g.

    SELECT * 
    FROM EMPLOYEE AS a 
    INNER JOIN EMPLOYEE b 
      ON a.empId = b.empId 
      AND a.empId = (SELECT MIN(empId) FROM EMPLOYEE);
    

    Output:

    1   Clark   Sales   1   Clark   Sales
    

    The subquery is processed first by the DMS and must not reference the outer tables.

    Login or Signup to reply.
  2. Using aggregation functions in the ON clause of a join doesn’t do anything useful, because in that position, b.empId doesn’t refer to a set of rows, it only refers to one row at a time. So MIN(b.empId) is not looking for the minimum out of all values of empId, it’s only looking for the minimum of the single value of b.empId that is current at the time the join expression is evaluated, which happens repeatedly, one row at a time.

    For the same reason, you can’t use aggregate functions in the WHERE clause:

    ... WHERE a.empId = MIN(b.empId) -- ERROR
    

    The bottom line is that you may use aggregation functions in the select-list, or the HAVING clause, or ORDER BY clause, but not any other clause.

    You tried to use a subquery to put the MIN(b.empId) in a select-list, but since b.empId refers to a value of the outer query, and your subquery doesn’t have a FROM clause of its own, it is really a single value within the scope of one row being evaluated for the join.

    An easier way to get the employee with the least empId is:

    SELECT * 
    FROM EMPLOYEE AS a 
    ORDER BY empId
    LIMIT 1
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search