skip to Main Content

Given the following table1 schema:


+---------+-----------------+------+-----+---------+----------------+
| Field   | Type            | Null | Key | Default | Extra          |
+---------+-----------------+------+-----+---------+----------------+
| ID      | int             | NO   | PRI | NULL    | auto_increment |
| NAME    | bigint unsigned | NO   | MUL | NULL    |                |
| SECONDS | int             | NO   | MUL | NULL    |                |
| LINK    | int             | YES  |     | NULL    |                |
+---------+-----------------+------+-----+---------+----------------+

and the following data in table1


+----+---------+---------+---------+
| ID | NAME    | SECONDS | LINK    |
+----+---------+---------+---------+
|  1 |       1 |       1 |       1 |
|  2 |       2 |       1 |       1 |
|  3 |       3 |       1 |       2 |
|  4 |       4 |       1 |       2 |
|  5 |       2 |       2 |       1 |
|  6 |       3 |       2 |       1 |
|  7 |       4 |       2 |       2 |
|  8 |       1 |       2 |       3 |
|  9 |       3 |       3 |       1 |
| 10 |       4 |       3 |       1 |
| 11 |       1 |       3 |       2 |
+----+---------+---------+---------+

return the following result


+---------+---------+---------+
| NAME    | SECONDS | LINK    |
+---------+---------+---------+
|       1 |       1 |       1 |
|       1 |       2 |       3 |
|       1 |       3 |       2 |
|       2 |       1 |       1 |
|       2 |       2 |       1 |
|       3 |       1 |       2 |
|       3 |       2 |       1 |
|       3 |       3 |       1 |
+---------+---------+---------+

The query should group all NAMEs that share the same SECONDS and LINK, for example

SELECT
  GROUP_CONCAT(DISTINCT NAME) AS NAME_list,
  SECONDS,
  LINK    
FROM
  table1
GROUP BY
  SECONDS,
  LINK    
HAVING
  COUNT(DISTINCT NAME) > 1;

returns

+--------------+---------+---------+
| NAME_list    | SECONDS | LINK    |
+--------------+---------+---------+
| 1,2          |       1 |       1 |
| 3,4          |       1 |       2 |
| 2,3          |       2 |       1 |
| 3,4          |       3 |       1 |
+--------------+---------+---------+

and then given an input list of NAMEs, say (1), for each group above where 1 appears, in our case 1,2, take the remaining NAMEs from the group, such as 2, and find all NAMEs where 2 appears in some other group.

Essentially the problem is to obtain transitive relationships two levels deep, that is since 1 connects to 2, and 2 connects to 3, we need to return NAMEs 1, 2, and 3. However we do not want to go another level deeper, that is, we are after

1 -> 2 -> 3          (two links)

but not

1 -> 2 -> 3 -> 4     (we don't need to return also link to 4)

even though based on the group 2,3 we know that 3 also connects to 4 as there exits a group 3,4.

In other words, given an input list of NAMEs, for each NAME, find all rows that are in transitive relationships two links deep A -> B -> C based on shared columns.

If possible how can this query be expressed in SQL?

2

Answers


  1. Your sample data is a little too small to consider this interesting task.
    But let’s try. Take a look at the example.

    with nsl as(
    select distinct name,concat(seconds,'-',link) SL
    from test
    )
    ,l2 as( -- NAMEs that share the same SECONDS and LINK (SL)
    select t1.name name1,t2.name name2, t1.sl sl
    from nsl t1
    left join nsl t2 on t1.sl=t2.sl and t1.name<t2.name
    where t2.name is not null
    )
    select t1.name1 name1,t1.name2 name2, t2.name2 name3,t1.sl sl1,t2.sl sl2 
    from l2 t1
    left join l2 t2 on 
        (t1.name2=t2.name1 and t2.sl<>t1.sl)
      or(t1.name2<t2.name2 and t2.sl=t1.sl)
    where t2.name2 is not null
    order by t1.name1
    

    Result is

    name1 name2 name3 sl1 sl2
    1 2 3 1-1 2-1
    2 3 4 2-1 1-2
    2 3 4 2-1 3-1

    To explain the idea of the solution, I will give the intermediate results of the queries

    Subquery l2 result – NAMEs that share the same SECONDS and LINK (Very similar to the result of your query)

    name1 name2 SL
    1 2 1-1
    3 4 1-2
    2 3 2-1
    3 4 3-1

    I’ll add some test data

    create table test(ID int,NAME int,SECONDS int,LINK int);
    insert  into test values
     (1 ,1 ,1 ,1 )
    ,(2 ,2 ,1 ,1 )
    ,(3 ,3 ,1 ,2 )
    ,(4 ,4 ,1 ,2 )
    ,(5 ,2 ,2 ,1 )
    ,(6 ,3 ,2 ,1 )
    ,(7 ,4 ,2 ,2 )
    ,(8 ,1 ,2 ,3 )
    ,(9 ,3 ,3 ,1 )
    ,(10,4 ,3 ,1 )
    ,(11,1 ,3 ,2 )
    -- added
    ,(12,5 ,13 ,12 )
    ,(13,6 ,13 ,12 )
    ,(14,7 ,13 ,12 )
    ,(15,7 ,20 ,30 )
    ,(16,8 ,20 ,30 )
    ;
    

    There result is

    name1 name2 name3 sl1 sl2
    1 2 3 1-1 2-1
    2 3 4 2-1 3-1
    2 3 4 2-1 1-2
    5 6 7 13-12 13-12
    5 6 7 13-12 13-12
    5 7 8 13-12 20-30
    6 7 8 13-12 20-30
    Login or Signup to reply.
  2. My approach would be to build this up using self-joins.

    Step 1: Prepare a table of which rows have different NAME but the same SECONDS and LINK (A => B):

    SELECT
      t1.ID AS leftID, t1.NAME AS leftNAME,
      t2.ID AS rightID, t2.NAME AS rightNAME
    FROM table1 t1
    INNER JOIN table1 t2 ON t1.NAME <> t2.NAME
        AND t1.SECONDS = t2.SECONDS
        AND t1.LINK = t2.LINK
    

    Results:

    +--------+----------+---------+-----------+
    | leftID | leftNAME | rightID | rightNAME |
    +--------+----------+---------+-----------+
    |      2 |        2 |       1 |         1 |
    |      1 |        1 |       2 |         2 |
    |      4 |        4 |       3 |         3 |
    |      3 |        3 |       4 |         4 |
    |      6 |        3 |       5 |         2 |
    |      5 |        2 |       6 |         3 |
    |     10 |        4 |       9 |         3 |
    |      9 |        3 |      10 |         4 |
    +--------+----------+---------+-----------+
    

    Step 2: Then self-join this table to get two steps (A => B => C):

    WITH transitions AS (
        SELECT
          t1.ID AS leftID, t1.NAME AS leftNAME,
          t2.ID AS rightID, t2.NAME AS rightNAME
        FROM table1 t1
        INNER JOIN table1 t2 ON t1.NAME <> t2.NAME
            AND t1.SECONDS = t2.SECONDS
            AND t1.LINK = t2.LINK
    )
    SELECT 
        one_to_two.leftID AS oneID,
        two_to_three.leftID AS twoID,
        two_to_three.rightID AS threeID
    FROM transitions one_to_two
    INNER JOIN transitions two_to_three ON
        one_to_two.rightNAME = two_to_three.leftNAME
        AND one_to_two.leftNAME <> two_to_three.rightNAME
    

    Results:

    +-------+-------+---------+
    | oneID | twoID | threeID |
    +-------+-------+---------+
    |     6 |     2 |       1 |
    |     5 |     3 |       4 |
    |     4 |     6 |       5 |
    |    10 |     6 |       5 |
    |     1 |     5 |       6 |
    |     5 |     9 |      10 |
    +-------+-------+---------+
    

    Step 3: Then join once more to the original table to get any rows that match any of the transition steps:

    WITH transitions AS (
        SELECT
          t1.ID AS leftID, t1.NAME AS leftNAME,
          t2.ID AS rightID, t2.NAME AS rightNAME
        FROM table1 t1
        INNER JOIN table1 t2 ON t1.NAME <> t2.NAME
            AND t1.SECONDS = t2.SECONDS
            AND t1.LINK = t2.LINK
    )
    SELECT DISTINCT table1.NAME, table1.SECONDS, table1.LINK
    FROM table1
    INNER JOIN (
        SELECT 
            one_to_two.leftID AS oneID,
            two_to_three.leftID AS twoID,
            two_to_three.rightID AS threeID
        FROM transitions one_to_two
        INNER JOIN transitions two_to_three ON
            one_to_two.rightNAME = two_to_three.leftNAME
            AND one_to_two.leftNAME <> two_to_three.rightNAME
    ) steps
        ON table1.ID = steps.oneID
        OR table1.ID = steps.twoID
        OR table1.ID = steps.threeID
    ORDER BY table1.NAME, table1.SECONDS, table1.LINK
    

    Results:

    +------+---------+------+
    | NAME | SECONDS | LINK |
    +------+---------+------+
    |    1 |       1 |    1 |
    |    2 |       1 |    1 |
    |    2 |       2 |    1 |
    |    3 |       1 |    2 |
    |    3 |       2 |    1 |
    |    3 |       3 |    1 |
    |    4 |       1 |    2 |
    |    4 |       3 |    1 |
    +------+---------+------+
    

    This isn’t quite your desired result set (two rows are different) but I think is more accurate according to your description.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search