I have some tables with properties of Name and Result each.
i want to join tables by name similarity but limit the result arrived from each group of similarity into max 2 results.
CREATE TABLE data_a(
Id serial primary key,
Name VARCHAR(70) NOT NULL,
Result INT4 NOT NULL);
CREATE TABLE data_b(
Id serial primary key,
Name VARCHAR(70) NOT NULL,
Result INT4 NOT NULL);
INSERT INTO data_a
(Name, Result)
VALUES
('Todd', 2),
('John', 5);
INSERT INTO data_b
(Name, Result)
VALUES
('Johns', 5),
('Todi', 3),
('Tod', 4),
('Todd', 5),
('John', 1),
('Jon', 1),
('Johny', 1),
('Johnny', 1),
('Johni', 1);
i would like to run a query that join both tables by name similarity and limit results to up to 2 results
SELECT da.Name as Name_a,db.Name , similarity(da.Name,da.Name) > 0.5
FROM data_a da
JOIN data_b db
ON da.Name % db.Name
GROUP BY da.Name,db.Name
ORDER BY similarity
LIMIT 2
and recive
|Name_a|Name_b|similarity|
|------|------|----------|
|Todd | Todd | 1 |
|------|------|----------|
|Todd | Tod |0.8 |
|------|------|----------|
|John | John |1 |
|------|------|----------|
|John | Johny|0.76 |
|------|------|----------|
currently i get
|Name_a|Name_b|similarity|
|------|------|----------|
|Todd | Todd | 1 |
|John | John |1 |
it seems that i’m not using correctly in group by , how can i group by
2
Answers
You can use lateral join to map each name to a name from the first table, then truncate the result to 2 values for each name.
Demo in sql<>daddy.io
If you want to apply the LIMIT separately per da.Name, you would have to do it in a LATERAL subquery: