I have input data
# | column1 | column2
--+---------+---------
1 | test1 | test2
2 | test3 | test4
3 | test5 | test4
4 | test3 | test2
5 | test5 | test2
6 | test5 | test6
7 | test3 | test6
and I need SQL query (Postgres) that finds all matches and uses each value only once on both sides.
For above example correct answer is:
# | column1 | column2
--+---------+---------
1 | test1 | test2
2 | test3 | test4
6 | test5 | test6
or
# | column1 | column2
--+---------+---------
1 | test1 | test2
3 | test5 | test4
7 | test3 | test6
With window functions, initial idea is sorting records by number of potential candidates to allow rows with fewer candidates to find a match.
select count(*) over (partition by column2), *
from my_table
order by count(*) over (partition by column2);
I don’t know how to pass information that value from column2
is already used to next frames. Any ideas?
2
Answers
You can use
DENSE_RANK
on both columns and fetch those rows where the two values equal.The function counts up the occurence of distinct values in a column. So it will increase whenever a value appears for the first time, otherwise the previous value will be set. Only in case both "rankings" equal, in both columns a new value appears. Then those rows should be selected.
This query…
…produces following result for your sample data:
So following entire query…
… will produce this result:
See this db<>fiddle with your sample data.
You can find distinct of each column i.e column1 and column2 and then assign a rank to them.
These ranks are then joined and id is fetched from the original input checking the rows that matched.
Fiddle
Output