Postgresql - How to merge arrays but also not duplicate another field

ChristopherLang
April 3, 2024
142 views
0 votes
2 Answers

I have some entries like this:

Distance	Ids	Zone	Owner
331	[1,2,4]	A	ABU
200	[3,4,5]	B	ABU

and I want to aggregate across them so that the distances sum, but then have a single array for ids that includes all the unique ids that will be aggregated. So the desired query output would be:

Distance	Ids	Zone	Owner
531	[1,2,3,4,5]	[A,B]	ABU

and here is a simple version of what I have so far:

SELECT   
  SUM(distance) as distance,
  ARRAY_AGG(id) as ids,
  ARRAY_AGG(zone) as zones,
  owner
FROM (   
  SELECT
    distance,
    UNNEST(ids) as id,
    zone,
    owner
FROM table)
GROUP BY owner

But this is counting each distance multiple times in the final output.

How can I reaggregate the ids without messing up the sum?

Tags: postgresql

Answers

- ValNik
- April 3, 2024 at 12:13 am
- 0 votes
0
You can calculate sum(distance) and array_agg(id) separately.
See example
```
select t.distance,t.owner,g.ids,g.zones
from( 
   select owner,sum(distance) distance from test group by owner
   ) t
left join (
   select owner,array_agg(distinct id)ids,array_agg(distinct zone) zones
   from( SELECT owner,zone,UNNEST(ids) as id FROM test ) x
   group by owner 
  ) g on t.owner=g.owner
```
Fiddle
Login or Signup to reply.

- Atmo
- April 3, 2024 at 3:53 am
- 0 votes
0
Your schema is not normalized and that is why you are struggling. The best approach here is to move ids into separate records of an integer field, in a separate table, then join that table to your main one.

If you cannot make the change immediately, an alternative to the method presented in the other answer is to flatten an array of dimension 2 (obtained with ARRAY_AGG) back into an array of dimension 1, all while ensuring the reconstructed array only keeps distinct values in order.

If you are guaranteed all the arrays contained in ids have the same length (ARRAY_AGG requires its parameters to have the same length), this can be done like so:
```
SELECT SUM(Distance),
       ARRAY(SELECT DISTINCT UNNEST(ARRAY_AGG(ids)) ORDER BY 1),
       ARRAY_AGG(Zone),
       Owner    
FROM T
GROUP BY Owner
```
If the above condition is not guaranteed, you need to transform ids so that it is.
The below expression used SELECT MAX(ARRAY_LENGTH(ids, 1)) FROM T to determine how long arrays should be and therefore append the right number of NULL to make all the arrays in ids the same length:
```
Ids || ARRAY_FILL(null::integer, ARRAY[(SELECT MAX(ARRAY_LENGTH(ids, 1)) FROM T) - ARRAY_LENGTH(ids, 1)])
```
Therefore, the above query becomes:
```
SELECT SUM(Distance), 
       ARRAY(SELECT DISTINCT * FROM UNNEST(ARRAY_AGG(Ids || ARRAY_FILL(null::integer, ARRAY[(SELECT MAX(ARRAY_LENGTH(ids, 1)) FROM T) - ARRAY_LENGTH(ids, 1)]))) WHERE UNNEST IS NOT NULL ORDER BY 1),
       ARRAY_AGG(Zone),
       Owner    
FROM T
GROUP BY Owner
```
Note: without access to your real table, it is hard for me to say how well the above query will perform.
- Variations of it using JOIN or CROSS APPLY could possibly have better performance.
- You could also add a condition so that you do not make all the
  transformed ids arrays the same length across the whole table but
  rather within each group of Owner.
At the risk of repeating myself, the correct long-term solution is to normalize your schema, though.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – How to merge arrays but also not duplicate another field

Answers