Why does UNION return less rows than one of its subrequests with MySQL?

philippe_b
October 17, 2023
135 views
0 votes
2 Answers

UNION can be used to merge the rows of two sub-requests:

SELECT column_name FROM table1
UNION
SELECT column_name FROM table2;

By default, MySQL’s UNION is actually UNION DISTINCT, which means that the rows table1 and table2 have in common are de-duplicated. For example:

SELECT * FROM table_1;

--------
| name |
--------
| John |
| Mary |
--------

SELECT * FROM table_2;

---------
| name  |
---------
| John  |
| Steve |
---------

SELECT * FROM table1
UNION
SELECT * FROM table2;

---------
| name  |
---------
| John  |
| Mary  |
| Steve |
---------

Note that John is present only once in the final result, because MySQL de-duplicate this row.

We can say that UnionRows.length <= Table1Rows.length + Table2Rows.length.

However, I’m facing an unexpected behavior. I have UnionRows.length < Table1Rows.length. Which means that when I merge two sets, I get less rows than just taking one of the set. I would expect to have UnionRows.length >= Table1Rows.length and UnionRows.length >= Table2Rows.length.

Answers

Chosen as BEST ANSWER
- philippe_b
- October 17, 2023 at 10:02 am
- 0 votes
0
This is because UNION also de-duplicates rows in each sub-request. For example:
```
SELECT * FROM table_1;

--------
| name |
--------
| John |
| John |
| John |
--------

SELECT * FROM table_2;

---------
| name  |
---------
| Steve |
---------

SELECT * FROM table1
UNION
SELECT * FROM table2;

---------
| name  |
---------
| John  |
| Steve |
---------
```
Note that the final result has less rows than table_1.

(Edit)

By default ISO standard SQL of UNION operator induce the DISTINCT of the resulting subset, not a DISTINCT over each query that compound the all statement.

This allow all combination of DISTINCT at every level.

Demo…

CREATE TABLE T1 (name1 VARCHAR(16));
INSERT INTO T1 VALUES ('Fred'), ('Fred'), ('Dave');
CREATE TABLE T2 (name2 VARCHAR(16));
INSERT INTO T2 VALUES ('Dave'), ('John'), ('Dave');

-- no duplicate at all
SELECT * FROM T1
UNION 
SELECT * FROM T2

name1
----------------
Dave
Fred
John

-- duplicate only for T1
SELECT ALL * FROM T1
UNION ALL
SELECT DISTINCT * FROM T2

name1
----------------
Fred
Fred
Dave
Dave
John


-- duplicate only for T2
SELECT DISTINCT * FROM T1
UNION ALL
SELECT ALL * FROM T2

name1
----------------
Dave
Fred
Dave
John
Dave

-- duplicate for all
SELECT ALL * FROM T1
UNION ALL
SELECT ALL * FROM T2

name1
----------------
Fred
Fred
Dave
Dave
John
Dave

Remember that ALL is by default in SELECT, by DISTINCT is by default in set based operators like UNION, DISTINCT, EXCEPT…

Please signup or login to give your own answer.

Click here to cancel reply.