Postgresql - Select the best matching for set of rows and use each value only once, e.g. unique paring

AndriiParfonov
December 15, 2024
69 views
1 vote
2 Answers

I have input data

# | column1 | column2
--+---------+---------
1 |  test1  |  test2
2 |  test3  |  test4
3 |  test5  |  test4
4 |  test3  |  test2
5 |  test5  |  test2
6 |  test5  |  test6
7 |  test3  |  test6

and I need SQL query (Postgres) that finds all matches and uses each value only once on both sides.

For above example correct answer is:

# | column1 | column2
--+---------+---------
1 |  test1  |  test2
2 |  test3  |  test4
6 |  test5  |  test6

# | column1 | column2
--+---------+---------
1 |  test1  |  test2
3 |  test5  |  test4
7 |  test3  |  test6

With window functions, initial idea is sorting records by number of potential candidates to allow rows with fewer candidates to find a match.

select count(*) over (partition by column2), * 
from my_table
order by count(*) over (partition by column2);

I don’t know how to pass information that value from column2 is already used to next frames. Any ideas?

Tags: postgresql sql

Answers

- JonasMetzler
- December 15, 2024 at 2:08 pm
- 0 votes
0
You can use DENSE_RANK on both columns and fetch those rows where the two values equal.

The function counts up the occurence of distinct values in a column. So it will increase whenever a value appears for the first time, otherwise the previous value will be set. Only in case both "rankings" equal, in both columns a new value appears. Then those rows should be selected.

This query…
```
SELECT 
  id,
  column1,
  column2,
  DENSE_RANK() OVER(ORDER BY column1) AS rn1,
  DENSE_RANK() OVER(ORDER BY column2) AS rn2
FROM my_table;
```
…produces following result for your sample data:

id column1 column2 rn1 rn2

1 test1 test2 1 1

2 test3 test4 2 2

4 test3 test2 2 1

7 test3 test6 2 3

3 test5 test4 3 2

5 test5 test2 3 1

6 test5 test6 3 3

So following entire query…
```
WITH sub AS
(SELECT 
  id,
  column1,
  column2,
  DENSE_RANK() OVER(ORDER BY column1) AS rn1,
  DENSE_RANK() OVER(ORDER BY column2) AS rn2
FROM my_table)
SELECT 
  id, 
  column1, 
  column2 
FROM sub 
WHERE rn1 = rn2
ORDER BY 
  column1, 
  column2;
```
… will produce this result:

id column1 column2

1 test1 test2

2 test3 test4

6 test5 test6

See this db<>fiddle with your sample data.
Login or Signup to reply.

- samhita
- December 15, 2024 at 2:13 pm
- 0 votes
0
You can find distinct of each column i.e column1 and column2 and then assign a rank to them.

These ranks are then joined and id is fetched from the original input checking the rows that matched.

Fiddle
```
WITH distinct_column1 AS (
    SELECT column1, row_number() OVER (ORDER BY column1) AS rn
    FROM (SELECT DISTINCT column1 FROM sample_data) AS distinct_values1
),
distinct_column2 AS (
    SELECT column2, row_number() OVER (ORDER BY column2) AS rn
    FROM (SELECT DISTINCT column2 FROM sample_data) AS distinct_values2
)
SELECT t.id AS id, d1.column1, d2.column2
FROM distinct_column1 d1
JOIN distinct_column2 d2
  ON d1.rn = d2.rn
JOIN sample_data t
  ON t.column1 = d1.column1 AND t.column2 = d2.column2
ORDER BY id; 
```
Output

id column1 column2

1 test1 test2

2 test3 test4

6 test5 test6
Login or Signup to reply.

id	column1	column2	rn1	rn2
1	test1	test2	1	1
2	test3	test4	2	2
4	test3	test2	2	1
7	test3	test6	2	3
3	test5	test4	3	2
5	test5	test2	3	1
6	test5	test6	3	3

id	column1	column2
1	test1	test2
2	test3	test4
6	test5	test6

id	column1	column2
1	test1	test2
2	test3	test4
6	test5	test6

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Select the best matching for set of rows and use each value only once, e.g. unique paring

Answers