Mysql - Find a group of names that share columns and then for each other name from group find in which other group it appears

MarioGalic
December 1, 2023
162 views
0 votes
2 Answers

Given the following table1 schema:


+---------+-----------------+------+-----+---------+----------------+
| Field   | Type            | Null | Key | Default | Extra          |
+---------+-----------------+------+-----+---------+----------------+
| ID      | int             | NO   | PRI | NULL    | auto_increment |
| NAME    | bigint unsigned | NO   | MUL | NULL    |                |
| SECONDS | int             | NO   | MUL | NULL    |                |
| LINK    | int             | YES  |     | NULL    |                |
+---------+-----------------+------+-----+---------+----------------+

and the following data in table1


+----+---------+---------+---------+
| ID | NAME    | SECONDS | LINK    |
+----+---------+---------+---------+
|  1 |       1 |       1 |       1 |
|  2 |       2 |       1 |       1 |
|  3 |       3 |       1 |       2 |
|  4 |       4 |       1 |       2 |
|  5 |       2 |       2 |       1 |
|  6 |       3 |       2 |       1 |
|  7 |       4 |       2 |       2 |
|  8 |       1 |       2 |       3 |
|  9 |       3 |       3 |       1 |
| 10 |       4 |       3 |       1 |
| 11 |       1 |       3 |       2 |
+----+---------+---------+---------+

return the following result


+---------+---------+---------+
| NAME    | SECONDS | LINK    |
+---------+---------+---------+
|       1 |       1 |       1 |
|       1 |       2 |       3 |
|       1 |       3 |       2 |
|       2 |       1 |       1 |
|       2 |       2 |       1 |
|       3 |       1 |       2 |
|       3 |       2 |       1 |
|       3 |       3 |       1 |
+---------+---------+---------+

The query should group all NAMEs that share the same SECONDS and LINK, for example

SELECT
  GROUP_CONCAT(DISTINCT NAME) AS NAME_list,
  SECONDS,
  LINK    
FROM
  table1
GROUP BY
  SECONDS,
  LINK    
HAVING
  COUNT(DISTINCT NAME) > 1;

returns

+--------------+---------+---------+
| NAME_list    | SECONDS | LINK    |
+--------------+---------+---------+
| 1,2          |       1 |       1 |
| 3,4          |       1 |       2 |
| 2,3          |       2 |       1 |
| 3,4          |       3 |       1 |
+--------------+---------+---------+

and then given an input list of NAMEs, say (1), for each group above where 1 appears, in our case 1,2, take the remaining NAMEs from the group, such as 2, and find all NAMEs where 2 appears in some other group.

Essentially the problem is to obtain transitive relationships two levels deep, that is since 1 connects to 2, and 2 connects to 3, we need to return NAMEs 1, 2, and 3. However we do not want to go another level deeper, that is, we are after

1 -> 2 -> 3          (two links)

but not

1 -> 2 -> 3 -> 4     (we don't need to return also link to 4)

even though based on the group 2,3 we know that 3 also connects to 4 as there exits a group 3,4.

In other words, given an input list of NAMEs, for each NAME, find all rows that are in transitive relationships two links deep A -> B -> C based on shared columns.

If possible how can this query be expressed in SQL?

Tags: mysql sql

Answers

Your sample data is a little too small to consider this interesting task.
But let’s try. Take a look at the example.

with nsl as(
select distinct name,concat(seconds,'-',link) SL
from test
)
,l2 as( -- NAMEs that share the same SECONDS and LINK (SL)
select t1.name name1,t2.name name2, t1.sl sl
from nsl t1
left join nsl t2 on t1.sl=t2.sl and t1.name<t2.name
where t2.name is not null
)
select t1.name1 name1,t1.name2 name2, t2.name2 name3,t1.sl sl1,t2.sl sl2 
from l2 t1
left join l2 t2 on 
    (t1.name2=t2.name1 and t2.sl<>t1.sl)
  or(t1.name2<t2.name2 and t2.sl=t1.sl)
where t2.name2 is not null
order by t1.name1

Result is

name1	name2	name3	sl1	sl2
1	2	3	1-1	2-1
2	3	4	2-1	1-2
2	3	4	2-1	3-1

To explain the idea of the solution, I will give the intermediate results of the queries

Subquery l2 result – NAMEs that share the same SECONDS and LINK (Very similar to the result of your query)

name1	name2	SL
1	2	1-1
3	4	1-2
2	3	2-1
3	4	3-1

I’ll add some test data

create table test(ID int,NAME int,SECONDS int,LINK int);
insert  into test values
 (1 ,1 ,1 ,1 )
,(2 ,2 ,1 ,1 )
,(3 ,3 ,1 ,2 )
,(4 ,4 ,1 ,2 )
,(5 ,2 ,2 ,1 )
,(6 ,3 ,2 ,1 )
,(7 ,4 ,2 ,2 )
,(8 ,1 ,2 ,3 )
,(9 ,3 ,3 ,1 )
,(10,4 ,3 ,1 )
,(11,1 ,3 ,2 )
-- added
,(12,5 ,13 ,12 )
,(13,6 ,13 ,12 )
,(14,7 ,13 ,12 )
,(15,7 ,20 ,30 )
,(16,8 ,20 ,30 )
;

There result is

name1	name2	name3	sl1	sl2
1	2	3	1-1	2-1
2	3	4	2-1	3-1
2	3	4	2-1	1-2
5	6	7	13-12	13-12
5	6	7	13-12	13-12
5	7	8	13-12	20-30
6	7	8	13-12	20-30

My approach would be to build this up using self-joins.

Step 1: Prepare a table of which rows have different NAME but the same SECONDS and LINK (A => B):

SELECT
  t1.ID AS leftID, t1.NAME AS leftNAME,
  t2.ID AS rightID, t2.NAME AS rightNAME
FROM table1 t1
INNER JOIN table1 t2 ON t1.NAME <> t2.NAME
    AND t1.SECONDS = t2.SECONDS
    AND t1.LINK = t2.LINK

Results:

+--------+----------+---------+-----------+
| leftID | leftNAME | rightID | rightNAME |
+--------+----------+---------+-----------+
|      2 |        2 |       1 |         1 |
|      1 |        1 |       2 |         2 |
|      4 |        4 |       3 |         3 |
|      3 |        3 |       4 |         4 |
|      6 |        3 |       5 |         2 |
|      5 |        2 |       6 |         3 |
|     10 |        4 |       9 |         3 |
|      9 |        3 |      10 |         4 |
+--------+----------+---------+-----------+

Step 2: Then self-join this table to get two steps (A => B => C):

WITH transitions AS (
    SELECT
      t1.ID AS leftID, t1.NAME AS leftNAME,
      t2.ID AS rightID, t2.NAME AS rightNAME
    FROM table1 t1
    INNER JOIN table1 t2 ON t1.NAME <> t2.NAME
        AND t1.SECONDS = t2.SECONDS
        AND t1.LINK = t2.LINK
)
SELECT 
    one_to_two.leftID AS oneID,
    two_to_three.leftID AS twoID,
    two_to_three.rightID AS threeID
FROM transitions one_to_two
INNER JOIN transitions two_to_three ON
    one_to_two.rightNAME = two_to_three.leftNAME
    AND one_to_two.leftNAME <> two_to_three.rightNAME

Results:

+-------+-------+---------+
| oneID | twoID | threeID |
+-------+-------+---------+
|     6 |     2 |       1 |
|     5 |     3 |       4 |
|     4 |     6 |       5 |
|    10 |     6 |       5 |
|     1 |     5 |       6 |
|     5 |     9 |      10 |
+-------+-------+---------+

Step 3: Then join once more to the original table to get any rows that match any of the transition steps:

WITH transitions AS (
    SELECT
      t1.ID AS leftID, t1.NAME AS leftNAME,
      t2.ID AS rightID, t2.NAME AS rightNAME
    FROM table1 t1
    INNER JOIN table1 t2 ON t1.NAME <> t2.NAME
        AND t1.SECONDS = t2.SECONDS
        AND t1.LINK = t2.LINK
)
SELECT DISTINCT table1.NAME, table1.SECONDS, table1.LINK
FROM table1
INNER JOIN (
    SELECT 
        one_to_two.leftID AS oneID,
        two_to_three.leftID AS twoID,
        two_to_three.rightID AS threeID
    FROM transitions one_to_two
    INNER JOIN transitions two_to_three ON
        one_to_two.rightNAME = two_to_three.leftNAME
        AND one_to_two.leftNAME <> two_to_three.rightNAME
) steps
    ON table1.ID = steps.oneID
    OR table1.ID = steps.twoID
    OR table1.ID = steps.threeID
ORDER BY table1.NAME, table1.SECONDS, table1.LINK

Results:

+------+---------+------+
| NAME | SECONDS | LINK |
+------+---------+------+
|    1 |       1 |    1 |
|    2 |       1 |    1 |
|    2 |       2 |    1 |
|    3 |       1 |    2 |
|    3 |       2 |    1 |
|    3 |       3 |    1 |
|    4 |       1 |    2 |
|    4 |       3 |    1 |
+------+---------+------+

This isn’t quite your desired result set (two rows are different) but I think is more accurate according to your description.

Please signup or login to give your own answer.

Click here to cancel reply.

Mysql – Find a group of names that share columns and then for each other name from group find in which other group it appears

Answers