I have some entries like this:
Distance | Ids | Zone | Owner |
---|---|---|---|
331 | [1,2,4] | A | ABU |
200 | [3,4,5] | B | ABU |
and I want to aggregate across them so that the distances sum, but then have a single array for ids that includes all the unique ids that will be aggregated. So the desired query output would be:
Distance | Ids | Zone | Owner |
---|---|---|---|
531 | [1,2,3,4,5] | [A,B] | ABU |
and here is a simple version of what I have so far:
SELECT
SUM(distance) as distance,
ARRAY_AGG(id) as ids,
ARRAY_AGG(zone) as zones,
owner
FROM (
SELECT
distance,
UNNEST(ids) as id,
zone,
owner
FROM table)
GROUP BY owner
But this is counting each distance multiple times in the final output.
How can I reaggregate the ids without messing up the sum?
2
Answers
You can calculate sum(distance) and array_agg(id) separately.
See example
Fiddle
Your schema is not normalized and that is why you are struggling. The best approach here is to move
ids
into separate records of aninteger
field, in a separate table, then join that table to your main one.If you cannot make the change immediately, an alternative to the method presented in the other answer is to flatten an array of dimension 2 (obtained with
ARRAY_AGG
) back into an array of dimension 1, all while ensuring the reconstructed array only keepsdistinct
values in order.If you are guaranteed all the arrays contained in
ids
have the same length (ARRAY_AGG
requires its parameters to have the same length), this can be done like so:If the above condition is not guaranteed, you need to transform ids so that it is.
The below expression used
SELECT MAX(ARRAY_LENGTH(ids, 1)) FROM T
to determine how long arrays should be and therefore append the right number ofNULL
to make all the arrays inids
the same length:Therefore, the above query becomes:
Note: without access to your real table, it is hard for me to say how well the above query will perform.
JOIN
orCROSS APPLY
could possibly have better performance.transformed
ids
arrays the same length across the whole table butrather within each group of
Owner
.At the risk of repeating myself, the correct long-term solution is to normalize your schema, though.