skip to Main Content

I have some entries like this:

Distance Ids Zone Owner
331 [1,2,4] A ABU
200 [3,4,5] B ABU

and I want to aggregate across them so that the distances sum, but then have a single array for ids that includes all the unique ids that will be aggregated. So the desired query output would be:

Distance Ids Zone Owner
531 [1,2,3,4,5] [A,B] ABU

and here is a simple version of what I have so far:

SELECT   
  SUM(distance) as distance,
  ARRAY_AGG(id) as ids,
  ARRAY_AGG(zone) as zones,
  owner
FROM (   
  SELECT
    distance,
    UNNEST(ids) as id,
    zone,
    owner
FROM table)
GROUP BY owner

But this is counting each distance multiple times in the final output.

How can I reaggregate the ids without messing up the sum?

2

Answers


  1. You can calculate sum(distance) and array_agg(id) separately.
    See example

    select t.distance,t.owner,g.ids,g.zones
    from( 
       select owner,sum(distance) distance from test group by owner
       ) t
    left join (
       select owner,array_agg(distinct id)ids,array_agg(distinct zone) zones
       from( SELECT owner,zone,UNNEST(ids) as id FROM test ) x
       group by owner 
      ) g on t.owner=g.owner
    

    Fiddle

    Login or Signup to reply.
  2. Your schema is not normalized and that is why you are struggling. The best approach here is to move ids into separate records of an integer field, in a separate table, then join that table to your main one.

    If you cannot make the change immediately, an alternative to the method presented in the other answer is to flatten an array of dimension 2 (obtained with ARRAY_AGG) back into an array of dimension 1, all while ensuring the reconstructed array only keeps distinct values in order.

    If you are guaranteed all the arrays contained in ids have the same length (ARRAY_AGG requires its parameters to have the same length), this can be done like so:

    SELECT SUM(Distance),
           ARRAY(SELECT DISTINCT UNNEST(ARRAY_AGG(ids)) ORDER BY 1),
           ARRAY_AGG(Zone),
           Owner    
    FROM T
    GROUP BY Owner
    

    If the above condition is not guaranteed, you need to transform ids so that it is.
    The below expression used SELECT MAX(ARRAY_LENGTH(ids, 1)) FROM T to determine how long arrays should be and therefore append the right number of NULL to make all the arrays in ids the same length:

    Ids || ARRAY_FILL(null::integer, ARRAY[(SELECT MAX(ARRAY_LENGTH(ids, 1)) FROM T) - ARRAY_LENGTH(ids, 1)])
    

    Therefore, the above query becomes:

    SELECT SUM(Distance), 
           ARRAY(SELECT DISTINCT * FROM UNNEST(ARRAY_AGG(Ids || ARRAY_FILL(null::integer, ARRAY[(SELECT MAX(ARRAY_LENGTH(ids, 1)) FROM T) - ARRAY_LENGTH(ids, 1)]))) WHERE UNNEST IS NOT NULL ORDER BY 1),
           ARRAY_AGG(Zone),
           Owner    
    FROM T
    GROUP BY Owner
    

    Note: without access to your real table, it is hard for me to say how well the above query will perform.

    • Variations of it using JOIN or CROSS APPLY could possibly have better performance.
    • You could also add a condition so that you do not make all the
      transformed ids arrays the same length across the whole table but
      rather within each group of Owner.

    At the risk of repeating myself, the correct long-term solution is to normalize your schema, though.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search