How to query PostgreSQL table for common elements in array column and aggregate results?

AndreaBoc
June 13, 2024
139 views
0 votes
2 Answers

I have the following schema in PostgreSQL 12:

CREATE TABLE test_aggregate(
  id_semilavorato text NOT NULL PRIMARY KEY,
  array_allestimenti text[]
);

INSERT INTO test_aggregate VALUES ('A',ARRAY['IDA1','IDA2']);
INSERT INTO test_aggregate VALUES ('B',ARRAY['IDA1']);
INSERT INTO test_aggregate VALUES ('C',ARRAY['IDA2']);
INSERT INTO test_aggregate VALUES ('D',ARRAY['IDA3']);
INSERT INTO test_aggregate VALUES ('E',ARRAY['IDA4']);

I would like to create a query to obtain groups of semi-finished products and their respective common setups that share at least one element. In this case, that returns 3 records with 2 columns of type text[] and text[], where each record contains semilavorati_comuni (common semi-finished products) and allastimenti_comuni (common setups).

The expected result would be:

semilavorati_comuni | allastimenti_comuni
{A,B,C} | {IDA1,IDA2}
{D}     | {IDA3}
{E}     | {IDA4}

How can I make this query?

Tags: postgresql

Answers

- Zegarek
- June 12, 2024 at 6:54 pm
- 0 votes
0
1. Select all rows in the table to wrap the id_semilavorato in an array[].
2. Join that with the table, pairing up each row based on allastimenti_comuni overlap with && and checking if the other group isn’t already in it. To only get (A,B) without also getting the symmetric (B,A) pairing, only join higher id_semilavorato to lower ones.
3. In each pair, concatenate their allastimenti_comuni arrays with || and append the new id_semilavorato onto semilavorati_comuni with array_append().
4. Do that again (and again, iteratively) until you find no more overlaps. That’ll be handled by the recursive CTE behaviour.
5. Afterwards, sort each group and keep only unique elements in each of them. You can do that by opening them up with unnest() closing back up with array_agg(distinct e order by e).
6. Keep only the largest groups by checking if anything larger exists that shares any elements.
_{demo at db<>fiddle}
```
with recursive cte as (
  select array[id_semilavorato] as semilavorati_comuni
        ,array_allestimenti as allastimenti_comuni
  from test_aggregate
  union all
  select array_append(cte.semilavorati_comuni,id_semilavorato)
        ,cte.allastimenti_comuni || t2.array_allestimenti
  from cte join test_aggregate t2 
  on cte.allastimenti_comuni && t2.array_allestimenti
  and array_position(cte.semilavorati_comuni,t2.id_semilavorato) is null
  and semilavorati_comuni[1] < t2.id_semilavorato)
select distinct 
  (select array_agg(distinct e order by e)
   from unnest(semilavorati_comuni)e)semilavorati_comuni
 ,(select array_agg(distinct e order by e)
   from unnest(allastimenti_comuni)e)allastimenti_comuni
from cte n1
where not exists(
  select from cte n2 
  where n1.allastimenti_comuni && n2.allastimenti_comuni
  and array_length(n2.allastimenti_comuni,1)
     >array_length(n1.allastimenti_comuni,1));
```
semilavorati_comuni allastimenti_comuni

{A,B,C,L,M} {IDA1,IDA2}

{D} {IDA3}

{E} {IDA4}

{F,G,H} {IDA5,IDA6}

{I,J,K} {IDA0,IDA7,IDA8,IDA9}

With the additional set-based operators provided in intarray extension some of the logic here could be a bit simpler and faster if these were all integer identifiers .
Login or Signup to reply.

semilavorati_comuni	allastimenti_comuni
{A,B,C,L,M}	{IDA1,IDA2}
{D}	{IDA3}
{E}	{IDA4}
{F,G,H}	{IDA5,IDA6}
{I,J,K}	{IDA0,IDA7,IDA8,IDA9}

That could be achieved selecting first ids and its maximum common array, then the rest of ids with no common elements and finally grouping

with common_elements as (
  select distinct t1.id_semilavorato id_semilavorato,
  case when array_length(t1.array_allestimenti, 1) > array_length(t2.array_allestimenti, 1)
      then t1.array_allestimenti 
      else t2.array_allestimenti 
  end array_allestimenti
  from test_aggregate t1, test_aggregate t2
  where t1.id_semilavorato != t2.id_semilavorato and 
  t1.array_allestimenti && t2.array_allestimenti )
select array_agg(ta.id_semilavorato) semilavorati_comuni, 
coalesce(ce.array_allestimenti, ta.array_allestimenti) allastimenti_comuni
from common_elements ce
right join test_aggregate ta on ce.id_semilavorato = ta.id_semilavorato
group by 2;

Fiddle to test

Please signup or login to give your own answer.

Click here to cancel reply.