Question:
I have the following table and data setup:
CREATE TABLE EMPLOYEE (
empId INTEGER AUTO_INCREMENT PRIMARY KEY,
name TEXT NOT NULL,
dept TEXT NOT NULL
);
INSERT INTO EMPLOYEE(name, dept) VALUES ('Clark', 'Sales');
INSERT INTO EMPLOYEE(name, dept) VALUES ('Dave', 'Accounting');
INSERT INTO EMPLOYEE(name, dept) VALUES ('Ava', 'Sales');
I am running this query:
SELECT *
FROM EMPLOYEE AS a
INNER JOIN EMPLOYEE b
ON a.empId = (SELECT MIN(b.empId));
Output:
+-------+-------+------------+-------+-------+------------+
| empId | name | dept | empId | name | dept |
+-------+-------+------------+-------+-------+------------+
| 1 | Clark | Sales | 1 | Clark | Sales |
| 2 | Dave | Accounting | 2 | Dave | Accounting |
| 3 | Ava | Sales | 3 | Ava | Sales |
+-------+-------+------------+-------+-------+------------+
I expect the query to return only the row with empId = 1, but instead, it returns all rows. Can someone explain why this is happening and how the INNER JOIN is evaluating the ON clause with the subquery?
What is the correct behavior here, and how can I fix this to only get the row with the smallest empId?
2
Answers
An SQL join needs to directly link the columns of two tables, "a" and "b", and may then further limit the rows by adding additional filters e.g.
Output:
The subquery is processed first by the DMS and must not reference the outer tables.
Using aggregation functions in the
ON
clause of a join doesn’t do anything useful, because in that position,b.empId
doesn’t refer to a set of rows, it only refers to one row at a time. SoMIN(b.empId)
is not looking for the minimum out of all values ofempId
, it’s only looking for the minimum of the single value ofb.empId
that is current at the time the join expression is evaluated, which happens repeatedly, one row at a time.For the same reason, you can’t use aggregate functions in the
WHERE
clause:The bottom line is that you may use aggregation functions in the select-list, or the
HAVING
clause, orORDER BY
clause, but not any other clause.You tried to use a subquery to put the
MIN(b.empId)
in a select-list, but sinceb.empId
refers to a value of the outer query, and your subquery doesn’t have aFROM
clause of its own, it is really a single value within the scope of one row being evaluated for the join.An easier way to get the employee with the least
empId
is: