skip to Main Content

I have the following schema in PostgreSQL 12:

CREATE TABLE test_aggregate(
  id_semilavorato text NOT NULL PRIMARY KEY,
  array_allestimenti text[]
);

INSERT INTO test_aggregate VALUES ('A',ARRAY['IDA1','IDA2']);
INSERT INTO test_aggregate VALUES ('B',ARRAY['IDA1']);
INSERT INTO test_aggregate VALUES ('C',ARRAY['IDA2']);
INSERT INTO test_aggregate VALUES ('D',ARRAY['IDA3']);
INSERT INTO test_aggregate VALUES ('E',ARRAY['IDA4']); 

I would like to create a query to obtain groups of semi-finished products and their respective common setups that share at least one element. In this case, that returns 3 records with 2 columns of type text[] and text[], where each record contains semilavorati_comuni (common semi-finished products) and allastimenti_comuni (common setups).

The expected result would be:

semilavorati_comuni | allastimenti_comuni
{A,B,C} | {IDA1,IDA2}
{D}     | {IDA3}
{E}     | {IDA4}

How can I make this query?

2

Answers


    1. Select all rows in the table to wrap the id_semilavorato in an array[].
    2. Join that with the table, pairing up each row based on allastimenti_comuni overlap with && and checking if the other group isn’t already in it. To only get (A,B) without also getting the symmetric (B,A) pairing, only join higher id_semilavorato to lower ones.
    3. In each pair, concatenate their allastimenti_comuni arrays with || and append the new id_semilavorato onto semilavorati_comuni with array_append().
    4. Do that again (and again, iteratively) until you find no more overlaps. That’ll be handled by the recursive CTE behaviour.
    5. Afterwards, sort each group and keep only unique elements in each of them. You can do that by opening them up with unnest() closing back up with array_agg(distinct e order by e).
    6. Keep only the largest groups by checking if anything larger exists that shares any elements.

    demo at db<>fiddle

    with recursive cte as (
      select array[id_semilavorato] as semilavorati_comuni
            ,array_allestimenti as allastimenti_comuni
      from test_aggregate
      union all
      select array_append(cte.semilavorati_comuni,id_semilavorato)
            ,cte.allastimenti_comuni || t2.array_allestimenti
      from cte join test_aggregate t2 
      on cte.allastimenti_comuni && t2.array_allestimenti
      and array_position(cte.semilavorati_comuni,t2.id_semilavorato) is null
      and semilavorati_comuni[1] < t2.id_semilavorato)
    select distinct 
      (select array_agg(distinct e order by e)
       from unnest(semilavorati_comuni)e)semilavorati_comuni
     ,(select array_agg(distinct e order by e)
       from unnest(allastimenti_comuni)e)allastimenti_comuni
    from cte n1
    where not exists(
      select from cte n2 
      where n1.allastimenti_comuni && n2.allastimenti_comuni
      and array_length(n2.allastimenti_comuni,1)
         >array_length(n1.allastimenti_comuni,1));
    
    semilavorati_comuni allastimenti_comuni
    {A,B,C,L,M} {IDA1,IDA2}
    {D} {IDA3}
    {E} {IDA4}
    {F,G,H} {IDA5,IDA6}
    {I,J,K} {IDA0,IDA7,IDA8,IDA9}

    With the additional set-based operators provided in intarray extension some of the logic here could be a bit simpler and faster if these were all integer identifiers .

    Login or Signup to reply.
  1. That could be achieved selecting first ids and its maximum common array, then the rest of ids with no common elements and finally grouping

    with common_elements as (
      select distinct t1.id_semilavorato id_semilavorato,
      case when array_length(t1.array_allestimenti, 1) > array_length(t2.array_allestimenti, 1)
          then t1.array_allestimenti 
          else t2.array_allestimenti 
      end array_allestimenti
      from test_aggregate t1, test_aggregate t2
      where t1.id_semilavorato != t2.id_semilavorato and 
      t1.array_allestimenti && t2.array_allestimenti )
    select array_agg(ta.id_semilavorato) semilavorati_comuni, 
    coalesce(ce.array_allestimenti, ta.array_allestimenti) allastimenti_comuni
    from common_elements ce
    right join test_aggregate ta on ce.id_semilavorato = ta.id_semilavorato
    group by 2;
    

    Fiddle to test

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search