skip to Main Content

In the following case:

CREATE TABLE Persons (
    groupId int,
    age int,
    Person varchar(255)
);

insert into Persons (Person, groupId, age) values('Bob'  , 1     , 32);
insert into Persons (Person, groupId, age) values('Jill'  , 1     , 34);
insert into Persons (Person, groupId, age)values('Shawn'  , 1     , 42);
insert into Persons (Person, groupId, age) values('Shawn'  , 1     , 42);
insert into Persons (Person, groupId, age) values('Jake'  , 2     , 29);
insert into Persons (Person, groupId, age) values('Paul'  , 2     , 36);
insert into Persons (Person, groupId, age) values('Laura'  , 2     , 39);

The following query:

SELECT *
FROM `Persons` o                    
  LEFT JOIN `Persons` b             
      ON o.groupId = b.groupId AND o.age < b.age

returns (executed in http://sqlfiddle.com/#!9/cae8023/5):

1   32  Bob 1   34  Jill
1   32  Bob 1   42  Shawn
1   34  Jill    1   42  Shawn
1   32  Bob 1   42  Shawn
1   34  Jill    1   42  Shawn
1   42  Shawn   (null)  (null)  (null)
1   42  Shawn   (null)  (null)  (null)
2   29  Jake    2   36  Paul
2   29  Jake    2   39  Laura
2   36  Paul    2   39  Laura
2   39  Laura   (null)  (null)  (null). 

I don’t understand the result.
I was expecting

1   32  Bob 1   34  Jill
1   32  Bob 1   42  Shawn
1   34  Jill    1   42  Shawn
1   42  Shawn   (null)  (null)  (null)
2   29  Jake    2   36  Paul
2   29  Jake    2   39  Laura
2   39  Laura   (null)  (null)  (null)

Reason I was expecting that is that in my understanding the left join picks each row from the left table, tries to match it each row of the right table and if there is a match it adds the row. If there is no match in the condition it adds the left row with null values for the right columns.
So if that is correct why in the fiddle output we have after
1 34 Jill 1 42 Shawn
rows for Bob and Jill repeated?

2

Answers


  1. Your condition for joining rows is that the groupId is equal and o.age < b.age.

    Bob’s age is 32. That is less than Jill’s age of 34. It is also less than Shawn’s age of 42. So the condition is satisfied in two pairings of joined rows.

    The joined row has all the columns from the row referenced as o and all the columns from the row referenced as b.

    Note that you have entered two rows for Shawn. Bob’s row actually matches Jill’s row and both rows for Shawn. So you get three rows for Bob.


    When I test your query on my local MySQL instance (8.0.31), I get the result in the following order, which is different from your sqlfiddle’s result:

    +---------+------+--------+---------+------+--------+
    | groupId | age  | Person | groupId | age  | Person |
    +---------+------+--------+---------+------+--------+
    |       1 |   32 | Bob    |       1 |   42 | Shawn  |
    |       1 |   32 | Bob    |       1 |   42 | Shawn  |
    |       1 |   32 | Bob    |       1 |   34 | Jill   |
    |       1 |   34 | Jill   |       1 |   42 | Shawn  |
    |       1 |   34 | Jill   |       1 |   42 | Shawn  |
    |       1 |   42 | Shawn  |    NULL | NULL | NULL   |
    |       1 |   42 | Shawn  |    NULL | NULL | NULL   |
    |       2 |   29 | Jake   |       2 |   39 | Laura  |
    |       2 |   29 | Jake   |       2 |   36 | Paul   |
    |       2 |   36 | Paul   |       2 |   39 | Laura  |
    |       2 |   39 | Laura  |    NULL | NULL | NULL   |
    +---------+------+--------+---------+------+--------+
    

    Without an explicit ORDER BY clause, the default behavior of InnoDB is to return rows in the order they are read from the index. In this case, it’s using the primary key order for both tables, because there’s no other index to optimize the join. You can see that the order of columns from the left table match the primary key order.

    I’m not sure how to explain why the Bob-Shawn rows are before the Bob-Jill row, because that’s not primary key order for the joined table. It could be that the order is messed up in the join buffer while doing an unindexed join.

    The sqlfiddle might be doing something in the client that reorders rows.

    Login or Signup to reply.
  2. You inserted the record of (Shawn) twice. Your query should be :

    CREATE TABLE Persons (
        groupId int,
        age int,
        Person varchar(255)
    );
    
    insert into Persons (Person, groupId, age) values('Bob'  , 1     , 32);
    insert into Persons (Person, groupId, age) values('Jill'  , 1     , 34);
    insert into Persons (Person, groupId, age)values('Shawn'  , 1     , 42);
    insert into Persons (Person, groupId, age) values('Jake'  , 2     , 29);
    insert into Persons (Person, groupId, age) values('Paul'  , 2     , 36);
    insert into Persons (Person, groupId, age) values('Laura'  , 2     , 39);
    
    SELECT *
    FROM `Persons` o                    
      LEFT JOIN `Persons` b             
          ON o.groupId = b.groupId AND o.age < b.age
    ;
    

    This will gives you the following results

    1   32  Bob 1   34  Jill
    1   32  Bob 1   42  Shawn
    1   34  Jill    1   42  Shawn
    1   42  Shawn   (null)  (null)  (null)
    2   29  Jake    2   36  Paul
    2   29  Jake    2   39  Laura
    2   36  Paul    2   39  Laura
    2   39  Laura   (null)  (null)  (null)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search