skip to Main Content

I use PostgreSQL 14 to manage a table which stores updates to a table of medics: users can update the first name, last name, and or/ the age of the medic. A field which was not touched by an update operation has a NULL value.

Here’s an example of four edits touching two separate medics. The medic with ID 3 received three updates: the first two are updating the age field, the third one touches the first_name:

SELECT * FROM medic_edits;
id medic_id first_name last_name age
1 1 Indy
2 3 59
3 3 63
4 3 Bob

I would like to merge this table such that in the resulting table there’s one row per medic, giving the cumulative edits. This is my current query and the output it produces:

SELECT
  medic_id,
  (ARRAY_REMOVE(ARRAY_AGG(first_name ORDER BY id DESC), NULL))[1] AS first_name,
  (ARRAY_REMOVE(ARRAY_AGG(last_name ORDER BY id DESC), NULL))[1] AS last_name,
  (ARRAY_REMOVE(ARRAY_AGG(age ORDER BY id DESC), NULL))[1] AS last_name
FROM medic_edits
GROUP BY medic_id
;
medic_id first_name last_name last_name
1 Indy
3 Bob 63

This is exactly the output I expected, but I suspect that the ARRAY_REMOVE/ARRAY_AGG logic is a bit wasteful. I wonder if there is a way to use partitions for good profit here, the FIRST_VALUE function looks very relevant.

3

Answers


  1. Yes, it’s wasteful. I expect this to be faster:

    SELECT DISTINCT ON (medic_id)
           medic_id
         , first_value(first_name) OVER (PARTITION BY medic_id ORDER BY CASE WHEN first_name IS NOT NULL THEN id END) AS first_name
         , first_value(last_name)  OVER (PARTITION BY medic_id ORDER BY CASE WHEN last_name  IS NOT NULL THEN id END) AS last_name
         , first_value(age)        OVER (PARTITION BY medic_id ORDER BY CASE WHEN age        IS NOT NULL THEN id END) AS age
    FROM   medic_edits;
    

    For descending id value, use instead:

           first_value(first_name) OVER (PARTITION BY medic_id ORDER BY CASE WHEN first_name IS NOT NULL THEN id END DESC NULLS LAST) AS first_name
    

    See:

    But there are probably faster ways, yet. Also depends on the exact table definition, cardinalities, and data distribution.

    See:

    About DISTINCT ON:

    Works in a single SELECT because DISTINCT or DISTINCT ON are applied after window functions. See:

    Aside: "age" is going to bit-rot rapidly. It’s typically superior to store a birthday.

    Login or Signup to reply.
  2. I have written two example queries for you that give the same results:

    select
        medic_id,
        max(first_name) as first_name,
        max(last_name) as last_name,
        max(age) as age  
    from medic_edits
    group by medic_id 
    
    
    select distinct 
        medic_id,  
        first_value(first_name) over (partition by medic_id order by first_name desc) as first_name,
        first_value(last_name) over (partition by medic_id  order by last_name desc) as last_name,
        max(age) over (PARTITION BY medic_id) as age
    from medic_edits
    
    
    Results:
    medic_id    first_name  last_name   age
    1           Indy        
    3           Bob                     63
    
    Login or Signup to reply.
  3. Sorry I don’t understand your question correctly,
    I wrote new query, this is right.

    select distinct 
        medic_id,  
        first_value(coalesce(first_name)) over (partition by medic_id order by case when first_name is not null then 0 else 1 end ASC, id desc) as first_name,
        first_value(coalesce(last_name)) over (partition by medic_id  order by case when last_name is not null then 0 else 1 end ASC, id desc) as last_name,
        first_value(coalesce(age)) over (partition by medic_id  order by case when age is not null then 0 else 1 end ASC, id desc) as age
    from medic_edits
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search