skip to Main Content

I have these tables:

Project

id name version
1 Swam 0.0.1
2 Dinali 0.0.1
3 Dinali 0.0.2
4 BigR 0.0.3
5 Kale 0.0.1
6 Kale 0.0.2

Person

id name
1 Jake
2 Skye
3 Kieth
4 Jim
5 Eliz
6 Haun

Person_Project

id person_id project_id
1 1 1
2 2 1
3 2 2
4 3 1
5 3 2
6 4 1
7 4 4
8 5 1
9 6 1
10 6 2
11 6 3

I want to find all the persons who are working on same projects exact match. From the above data, result should be persons 1 and 5 because they both working on project 1, persons 2 and 3 because they are working on same projects 1 and 2.
Should not return 4 and 6 as there are no other persons working exact projects they are working on.

2

Answers


  1. Query:

    
     with projects as
      (
       select person_id, STRING_AGG (project_id::varchar,' , ' order by project_id ) project_ids from Person_Project
       group by Person_id
      )
      ,cte as
       (select project_ids from projects group by project_ids having count(*)>1)
      select person_id from projects where project_ids in (select project_ids from cte)
    
    

    OUtput:

    person_id
    1
    2
    3
    5

    fiddle

    Login or Signup to reply.
  2. This can be optimized.

    2x subquery, 1x window function

    SELECT person_id
    FROM  (
       SELECT person_id, count(*) OVER (PARTITION BY projects) AS match_ct
       FROM  (
          SELECT person_id, array_agg(project_id) AS projects
          FROM   person_project
          GROUP  BY 1
          ) sub1
       ) sub2
    WHERE  match_ct > 1;
    

    1x CTE, 1x EXISTS

    WITH cte AS (
       SELECT person_id, array_agg(project_id) AS projects
       FROM   person_project
       GROUP  BY 1
       )
    SELECT person_id
    FROM   cte c1
    WHERE  EXISTS (
       SELECT FROM cte c2
       WHERE  c2.projects = c1.projects
       AND    c2.person_id <> c1.person_id
       );
    

    In addition, array_agg() should be faster that string_agg() (also avoiding the cast).

    For large numbers of persons and/or projects, it should pay to create a temporary table instead of the CTE, add an index on (hash_array(projects), person_id), and compare the hash values.

    For very large sets, use hash_array_extended(projects, 0) to practically rule out hash collisions.

    Related:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search