skip to Main Content

UNION can be used to merge the rows of two sub-requests:

SELECT column_name FROM table1
UNION
SELECT column_name FROM table2;

By default, MySQL’s UNION is actually UNION DISTINCT, which means that the rows table1 and table2 have in common are de-duplicated. For example:

SELECT * FROM table_1;

--------
| name |
--------
| John |
| Mary |
--------

SELECT * FROM table_2;

---------
| name  |
---------
| John  |
| Steve |
---------

SELECT * FROM table1
UNION
SELECT * FROM table2;

---------
| name  |
---------
| John  |
| Mary  |
| Steve |
---------

Note that John is present only once in the final result, because MySQL de-duplicate this row.

We can say that UnionRows.length <= Table1Rows.length + Table2Rows.length.

However, I’m facing an unexpected behavior. I have UnionRows.length < Table1Rows.length. Which means that when I merge two sets, I get less rows than just taking one of the set. I would expect to have UnionRows.length >= Table1Rows.length and UnionRows.length >= Table2Rows.length.

2

Answers


  1. Chosen as BEST ANSWER

    This is because UNION also de-duplicates rows in each sub-request. For example:

    SELECT * FROM table_1;
    
    --------
    | name |
    --------
    | John |
    | John |
    | John |
    --------
    
    SELECT * FROM table_2;
    
    ---------
    | name  |
    ---------
    | Steve |
    ---------
    
    SELECT * FROM table1
    UNION
    SELECT * FROM table2;
    
    ---------
    | name  |
    ---------
    | John  |
    | Steve |
    ---------
    

    Note that the final result has less rows than table_1.


  2. By default ISO standard SQL of UNION operator induce the DISTINCT of the resulting subset, not a DISTINCT over each query that compound the all statement.

    This allow all combination of DISTINCT at every level.

    Demo…

    CREATE TABLE T1 (name1 VARCHAR(16));
    INSERT INTO T1 VALUES ('Fred'), ('Fred'), ('Dave');
    CREATE TABLE T2 (name2 VARCHAR(16));
    INSERT INTO T2 VALUES ('Dave'), ('John'), ('Dave');
    
    -- no duplicate at all
    SELECT * FROM T1
    UNION 
    SELECT * FROM T2
    
    name1
    ----------------
    Dave
    Fred
    John
    
    -- duplicate only for T1
    SELECT ALL * FROM T1
    UNION ALL
    SELECT DISTINCT * FROM T2
    
    name1
    ----------------
    Fred
    Fred
    Dave
    Dave
    John
    
    
    -- duplicate only for T2
    SELECT DISTINCT * FROM T1
    UNION ALL
    SELECT ALL * FROM T2
    
    name1
    ----------------
    Dave
    Fred
    Dave
    John
    Dave
    
    -- duplicate for all
    SELECT ALL * FROM T1
    UNION ALL
    SELECT ALL * FROM T2
    
    name1
    ----------------
    Fred
    Fred
    Dave
    Dave
    John
    Dave
    

    Remember that ALL is by default in SELECT, by DISTINCT is by default in set based operators like UNION, DISTINCT, EXCEPT…

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search