skip to Main Content

Via SQL, I’m trying to get from this:

id group_id session_id field_label field_value sent_at
1 frosted flakes blue bowl first_name Bob 2022-11-05 18:18:19.093
2 frosted flakes blue bowl first_name Bobby 2022-11-05 18:17:31.274
3 frosted flakes blue bowl last_name Brown 2022-11-05 18:17:16.241
4 frosted flakes blue bowl last_name Browning 2022-11-05 18:15:34.492
5 frosted flakes blue bowl last_name Brownson 2022-11-05 18:14:58.465
6 cheerios green cup first_name Christine 2022-11-05 18:18:58.222
7 cheerios green cup last_name Christmas 2022-11-05 18:20:41.212
8 cheerios green cup last_name Christopherson 2022-11-05 18:24:58.222

where

  • id is unique
  • group_id is not unique
  • session_id is not unique

to this:

group_id session_id amalgamated_field
frosted flakes blue bowl Bob Brown
cheerios green cup Christine Christopherson

Where I know the field_labels that I want to amalgamate, and I want to get the latest value for each amalgamated field label based on sent_at grouped by group_id.

So for group frosted flakes, I want to get the most recent field_value associated with field_label first_name (Bob) and the most recent field_value associated with field_label last_name (Brown).

And repeat.

I tried a cross join and I also tried an inner join similar to this thread. But I keep getting all combinations :/

3

Answers


  1. You may use ROW_NUMBER function with conditional aggregation as the following:

    SELECT group_id, session_id,
           CONCAT_WS(' ', 
                      MAX(CASE WHEN field_label='first_name' AND rn=1 THEN field_value END),
                      MAX(CASE WHEN field_label='last_name' AND rn=1 THEN field_value END)
                    ) AS amalgamated_field
    FROM
    (
      SELECT *, 
        ROW_NUMBER() OVER (PARTITION BY group_id, session_id, field_label ORDER BY sent_at DESC) rn
      FROM table_name
    ) T
    GROUP BY group_id, session_id
    ORDER BY group_id
    

    See a demo.

    Login or Signup to reply.
  2. You can first find the maximum timestamp for each field_label, and then perform a self-join of the original table back onto the maximum timestamp, aggregating the field_values for each group_id:

    select t1.group_id, t2.session_id, string_agg(t2.field_value,' ') 
    from (select t.group_id, t.field_label, max(t.sent_at) d 
       from tbl t group by t.group_id, t.field_label) t1 
    join tbl t2 on t1.d = t2.sent_at group by t1.group_id, t2.session_id
    

    See fiddle.

    Login or Signup to reply.
  3. In Postgres I would recommend distinct on and string aggregation:

    select group_id, session_id, 
        string_agg(field_value, ' ' order by field_label) full_name
    from (
        select distinct on (group_id, session_id, field_label) t.*
        from mytable t
        order by group_id, session_id, field_label, sent_at desc
    ) t
    where field_label in ('first_name', 'last_name')
    group by group_id, session_id
    

    The distinct on subquery returns the latest row for each group/session/label tuple. In the outer query, we filter on the two labels we are interested in, and aggregate the field values for each group/session tuple. The order by clause of string_agg ensures that the first and last name are concatenated in sequence.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search