skip to Main Content

I want to compare and get number of corresposndeces between two arrays in PostgresSQL.

For example:
given the table

id array
1 a, b, d
2 a, e
3 a, b, d, e
4 b, c, d

I would like to obtain a comparison table:

id_1 id_2 number_of_matches
1 2 1
1 3 3
1 4 2
2 3 2
2 4 0
3 4 2

There is no repeated values in each array, and the order of items inside arrays does not matter.

I found here about the overlap operator (&&) but it only returns a boolean if there is or not any correspondence between two arrays, but not the number of correspondences.

CREATE TABLE data (id integer, val text[]);
INSERT INTO data VALUES (1, '{a,b,d}');
INSERT INTO data VALUES (2, '{a,e}');
INSERT INTO data VALUES (3, '{a,b,d,e}');
INSERT INTO data VALUES (4, '{b,c,d}');
SELECT a.id id_1, b.id id_2, a.val && b.val match 
FROM data a, data b
WHERE a.id < b.id;

returns

id_1 id_2 match
1 2 t
1 3 t
1 4 t
2 3 t
2 4 f
3 4 t

2

Answers


  1. demo:db<>fiddle

    WITH normalized AS (                      -- 2
        SELECT
            id,
            unnest(val) as array_element      -- 1
        FROM data
    )
    SELECT
        n1.id,
        n2.id,
        COUNT(*)
    FROM normalized n1
    JOIN normalized n2                        -- 3
        ON (n1.array_element = n2.array_element AND n1.id <> n2.id)
    GROUP BY n1.id, n2.id                     -- 4
    ORDER BY n1.id, n2.id
    
    1. Normalize your data (note: You should think about normalizing your data directly in the table; means: no arrays) by creating one row per array element
    2. Use a CTE to "store" your normalized result
    3. Self-join your normalized result (but not the same ids) at the array_element
    4. GROUP and COUNT
    Login or Signup to reply.
  2. https://www.db-fiddle.com/f/sobBPcfXY7h8j7y6tKXfDj/0

    get the intersection of the 2 arrays an count the elements of it:

     SELECT 
       data.id,
       data2.id as id2,
       data.val,
       data2.val val2,
       ARRAY( SELECT * FROM UNNEST( data.val ) WHERE UNNEST = ANY( data2.val )) as intersection,
       cardinality(ARRAY( SELECT * FROM UNNEST( data.val ) WHERE UNNEST = ANY( data2.val ))) as number_of_matches
       from data,data as data2
     where data.id <> data2.id
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search