Postgresql - Postgres: Find co-occurrences of array elements

mhkeller
July 26, 2023
171 views
0 votes
2 Answers

I have a text array field in postgres we can call items. I want to create a query that will tell me how many times each of these items co-occurs with one another.

For an example set of rows:

items
-----
{'a', 'c'}
{'a', 'b', 'c'}
{'a', 'c'}
{'a', 'b', 'c'}

Here is example output using a : to separate the item name and the count of co-occurrences:

item|co_occurrences
-------------------
a   |{c:4,b:2}
b   |{a:2,c:2}
c   |{a:4,b:2}

The item column lists the individual items. The co_occurences column is an array of text elements combining the co-occurring item and the count. What query will produce this?

Tags: arrays postgresql

Answers

Convert it into normal relational form first.

This code assigns arbitrary row id values:

with create_ids as (
  select row_number() over (order by items) as id,
         items
    from item_groups
), normalize as (
  select i.id, u.item
    from create_ids i
         cross join lateral unnest(i.items) as u(item)
), correlate as (
  select a.item, b.item as coitem, count(b.item) as occurence_count
    from normalize a
         left join normalize b
           on b.id = a.id and b.item != a.item
   group by a.item, b.item
)fiddle
select item, jsonb_object_agg(coitem, occurence_count) as co_occurences
  from correlate 
 group by item;

Working fiddle

If the objective does not have to be JSON, then we can construct a string. This allows ordering the values in the string:

with create_ids as (
  select row_number() over (order by items) as id,
         items
    from item_groups
), normalize as (
  select i.id, u.item
    from create_ids i
         cross join lateral unnest(i.items) as u(item)
), correlate as (
  select a.item, b.item as coitem, count(b.item) as occurence_count
    from normalize a
         left join normalize b
           on b.id = a.id and b.item != a.item
   group by a.item, b.item
)
select item, array_agg(coitem||':'||occurence_count order by occurence_count desc) as co_occurences
  from correlate 
 group by item;

Updated fiddle

- Bergi
- July 26, 2023 at 10:55 pm
- 0 votes
0
Instead of generating row ids for the self-join like @MikeOrganek, I’d just unnest twice to produce a relation with duplicate tuples:
```
SELECT x AS item, json_object_agg(y, count) AS co_occurences
FROM (
  SELECT x, y, count(*)
  FROM example, unnest(items) AS x, unnest(items) AS y
  WHERE x <> y
  GROUP BY x, y
) tmp
GROUP BY x;
```
^{(online demo)}
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Postgres: Find co-occurrences of array elements

Answers