Find all records with same join column data - Postgresql

user09
April 26, 2023
316 views
0 votes
2 Answers

I have these tables:

Project

id	name	version
1	Swam	0.0.1
2	Dinali	0.0.1
3	Dinali	0.0.2
4	BigR	0.0.3
5	Kale	0.0.1
6	Kale	0.0.2

Person

id	name
1	Jake
2	Skye
3	Kieth
4	Jim
5	Eliz
6	Haun

Person_Project

id	person_id	project_id
1	1	1
2	2	1
3	2	2
4	3	1
5	3	2
6	4	1
7	4	4
8	5	1
9	6	1
10	6	2
11	6	3

I want to find all the persons who are working on same projects exact match. From the above data, result should be persons 1 and 5 because they both working on project 1, persons 2 and 3 because they are working on same projects 1 and 2.
Should not return 4 and 6 as there are no other persons working exact projects they are working on.

Answers

Query:


 with projects as
  (
   select person_id, STRING_AGG (project_id::varchar,' , ' order by project_id ) project_ids from Person_Project
   group by Person_id
  )
  ,cte as
   (select project_ids from projects group by project_ids having count(*)>1)
  select person_id from projects where project_ids in (select project_ids from cte)

OUtput:

person_id
1
2
3
5

fiddle

- ErwinBrandstetter
- April 27, 2023 at 2:39 am
- 0 votes
0
This can be optimized.

2x subquery, 1x window function
```
SELECT person_id
FROM  (
   SELECT person_id, count(*) OVER (PARTITION BY projects) AS match_ct
   FROM  (
      SELECT person_id, array_agg(project_id) AS projects
      FROM   person_project
      GROUP  BY 1
      ) sub1
   ) sub2
WHERE  match_ct > 1;
```
1x CTE, 1x EXISTS
```
WITH cte AS (
   SELECT person_id, array_agg(project_id) AS projects
   FROM   person_project
   GROUP  BY 1
   )
SELECT person_id
FROM   cte c1
WHERE  EXISTS (
   SELECT FROM cte c2
   WHERE  c2.projects = c1.projects
   AND    c2.person_id <> c1.person_id
   );
```
In addition, array_agg() should be faster that string_agg() (also avoiding the cast).

For large numbers of persons and/or projects, it should pay to create a temporary table instead of the CTE, add an index on (hash_array(projects), person_id), and compare the hash values.

For very large sets, use hash_array_extended(projects, 0) to practically rule out hash collisions.

Related:
- Why doesn’t my UNIQUE constraint trigger?
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Find all records with same join column data – Postgresql

Answers

2x subquery, 1x window function

1x CTE, 1x `EXISTS`

Find all records with same join column data – Postgresql

Answers

2x subquery, 1x window function

1x CTE, 1x EXISTS

1x CTE, 1x `EXISTS`