Postgresql - Postgres: "cumulative" view of table

VladislavMikitich
May 14, 2023
258 views
0 votes
2 Answers

I have a table of changes like this

CREATE TABLE IF NOT EXISTS changes (
    entity_id TEXT NOT NULL,
    column_id TEXT NOT NULL,
    value JSONB NOT NULL,
    updated_at TIMESTAMP NOT NULL
);

INSERT INTO changes VALUES ('1', 'height', to_jsonb(140), '01-01-2021 00:00:00'::TIMESTAMP);
INSERT INTO changes VALUES ('1', 'weight', to_jsonb(30), '01-01-2021 00:00:00'::TIMESTAMP);
INSERT INTO changes VALUES ('1', 'height', to_jsonb(145), '01-02-2021 00:00:00'::TIMESTAMP);
INSERT INTO changes VALUES ('1', 'weight', to_jsonb(34),'01-03-2021 00:00:00'::TIMESTAMP);


 entity_id | column_id | value |     updated_at      
-----------+-----------+-------+---------------------
 1         | height    | 140   | 2021-01-01 00:00:00
 1         | weight    | 30    | 2021-01-01 00:00:00
 1         | height    | 145   | 2021-01-02 00:00:00
 1         | weight    | 34    | 2021-01-03 00:00:00

And I want to get kinda cumulative view of this table

entity_id  | height | weight |     updated_at      
-----------+--------+--------+---------------------
 1         | 140    | 30     | 2021-01-01 00:00:00
 1         | 145    | 30     | 2021-01-02 00:00:00
 1         | 145    | 34     | 2021-01-03 00:00:00

My current query looks working

SELECT
  entity_id,
  coalesce(change->'height', lag(change->'height', 1, null) over (partition by entity_id order by updated_at)) as height,
  coalesce(change->'weight', lag(change->'weight', 1, null) over (partition by entity_id order by updated_at)) as weight,
  updated_at
FROM (
    SELECT entity_id, json_object_agg(column_id, value) as change, updated_at FROM changes
    GROUP BY entity_id, updated_at
) as changes;

But I don’t like json_object_agg here and I’m sure there is a way do it without redundant aggregations? some kind of using window aggregate function that I’ve missed.

UPD. @SelVazi helped to make query better, but I feel it’s not the final solution.

with cte as (
  SELECT
    entity_id,
    max(case when column_id = 'height' then value::int end) as height,
    max(case when column_id = 'weight' then value::int end) as weight,
    updated_at
  from changes
  GROUP by entity_id, updated_at
)
select
  entity_id,
  coalesce(height, lag(height) over (partition by entity_id order by updated_at)) as height,
  coalesce(weight, lag(weight) over (partition by entity_id order by updated_at)) as weight,
  updated_at
from cte;

Answers

Instead of using json_object_agg you can use the conditional aggregation to get hight and width as columns :

SELECT entity_id, MAX(CASE WHEN column_id = 'height' THEN value::text END) AS height
                , MAX(CASE WHEN column_id = 'weight' THEN value::text END) AS weight
                , updated_at
FROM changes
GROUP BY entity_id, updated_at

The final query :

SELECT
  entity_id,
  coalesce(height, lag(height, 1, null) over (partition by entity_id order by updated_at)) as height,
  coalesce(weight, lag(weight, 1, null) over (partition by entity_id order by updated_at)) as weight,
  updated_at
FROM (
    SELECT entity_id, MAX(CASE WHEN column_id = 'height' THEN value::text END) AS height
                , MAX(CASE WHEN column_id = 'weight' THEN value::text END) AS weight
                , updated_at
    FROM changes
    GROUP BY entity_id, updated_at
) as changes;

Result :

entity_id   height  weight  updated_at
1           140     30      2021-01-01 00:00:00
1           145     30      2021-02-01 00:00:00
1           145     34      2021-03-01 00:00:00

This is more complicated that it may seem. Pivoting the height and weight to columns can be done with conditional aggregation, but then we have to fill the "missing" values.

I would assume that there could be gaps of more than one date for any of the two measures, which makes lag() the wrong fit in Postgres, since it can only look back over a predefined number of rows (and cannot ignore null values).

We can demonstrate the problem with lag() by adding just one row at the end of your sample data:

entity_id	column_id	value	updated_at
1	height	140	2021-01-01 00:00:00
1	weight	30	2021-01-01 00:00:00
1	height	145	2021-02-01 00:00:00
1	weight	34	2021-03-01 00:00:00
1	weight	140	2021-04-01 00:00:00

One workaround uses a gaps-and-islands technique to put "missing" values in groups that start with a non-null value, which then becomes the new value.

select entity_id, updated_at,
    max(height) over(partition by entity_id, grp_height) height,
    max(weight) over(partition by entity_id, grp_weight) weight
from (
    select c.*,
        count(height) over (partition by entity_id order by updated_at) grp_height,
        count(weight) over (partition by entity_id order by updated_at) grp_weight
    from (
        select entity_id, updated_at,
            max(value::text) filter(where column_id = 'height') height,
            max(value::text) filter(where column_id = 'weight') weight
        from changes
        group by entity_id, updated_at
    ) c
) c
order by entity_id, updated_at

fiddle

entity_id	updated_at	height	weight
1	2021-01-01 00:00:00	140	30
1	2021-02-01 00:00:00	145	30
1	2021-03-01 00:00:00	145	34
1	2021-04-01 00:00:00	145	140

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Postgres: "cumulative" view of table

Answers