I use PostgreSQL 14 to manage a table which stores updates to a table of medics: users can update the first name, last name, and or/ the age of the medic. A field which was not touched by an update operation has a NULL
value.
Here’s an example of four edits touching two separate medics. The medic with ID 3 received three updates: the first two are updating the age
field, the third one touches the first_name
:
SELECT * FROM medic_edits;
id | medic_id | first_name | last_name | age |
---|---|---|---|---|
1 | 1 | Indy | ||
2 | 3 | 59 | ||
3 | 3 | 63 | ||
4 | 3 | Bob |
I would like to merge this table such that in the resulting table there’s one row per medic, giving the cumulative edits. This is my current query and the output it produces:
SELECT
medic_id,
(ARRAY_REMOVE(ARRAY_AGG(first_name ORDER BY id DESC), NULL))[1] AS first_name,
(ARRAY_REMOVE(ARRAY_AGG(last_name ORDER BY id DESC), NULL))[1] AS last_name,
(ARRAY_REMOVE(ARRAY_AGG(age ORDER BY id DESC), NULL))[1] AS last_name
FROM medic_edits
GROUP BY medic_id
;
medic_id | first_name | last_name | last_name |
---|---|---|---|
1 | Indy | ||
3 | Bob | 63 |
This is exactly the output I expected, but I suspect that the ARRAY_REMOVE/ARRAY_AGG
logic is a bit wasteful. I wonder if there is a way to use partitions for good profit here, the FIRST_VALUE
function looks very relevant.
3
Answers
Yes, it’s wasteful. I expect this to be faster:
For descending
id
value, use instead:See:
But there are probably faster ways, yet. Also depends on the exact table definition, cardinalities, and data distribution.
See:
About
DISTINCT ON
:Works in a single
SELECT
becauseDISTINCT
orDISTINCT ON
are applied after window functions. See:Aside: "age" is going to bit-rot rapidly. It’s typically superior to store a birthday.
I have written two example queries for you that give the same results:
Sorry I don’t understand your question correctly,
I wrote new query, this is right.