skip to Main Content

I have a Postgres table foobar with the following schema:

id: integer
group: integer
foo: integer
bar: integer
timestamp: integer

I keep track of updates to various groups, and those updates mutate the foo and bar properties. Every time I receive an update, I store the timestamp.

Here’s an example value I could have in the databse:

+------+---------+-------+-------+-------------+
| "id" | "group" | "foo" | "bar" | "timestamp" |
+------+---------+-------+-------+-------------+
| 1    | 1       | 10    | 20    | 1           |
| 2    | 1       | 11    | 19    | 2           |
| 3    | 1       | 10    | 20    | 3           |
| 4    | 1       | 10    | 20    | 4           |
+------+---------+-------+-------+-------------+

Oftentimes, the updates I receive are identical. A particularly critical piece of information I’m trying to extract is when I first received the combination of values that is current — but not for the first time ever, but rather the first update after which there haven’t been any changes.

A naïve approach would be the following query:

SELECT DISTINCT ON ("group", foo, bar) *
FROM foobar
ORDER BY "group", foo, bar, timestamp DESC;

However, that query would return the last row, which has the latest timestamp. If I switch timestamp to ASC, I would get the very first row, because I have seen the exact foo/bar value combination prior to the update at timestamp 2.

The intuitive thing would have been to simply move the timestamp DESC sort command prior to foo, but Postgres does not allow that. Something like MySQL’s HAVING operation could also have come in handy, but Postgres unfortunately doesn’t support that.

An incredibly inefficient approach I could take is programmatically iterate through each group, get the latest row, and then fetch all rows in descending timestamp order and stop as soon as I observe a change, but it seems that letting a database do this sort of operation would be wiser.

I am quite certain that I’m missing something obvious, but would greatly appreciate any help. Thanks!

3

Answers


  1. You may resort to the old-school way of doing ‘distinct on’ selection using row_number window function:

    select distinct on ("group", foo, bar) * from 
    (
     select *, 
       row_number() over (partition by "group", foo, bar order by "timestamp") rnk
     from foobar
    ) t
    where rnk > 1 -- ignore the initial set of values
    order by "group", foo, bar, rnk; -- and pick the earliest record
    

    Demo

    Login or Signup to reply.
  2. A way to solve your issue would be this:

    select fb1."group", fb1."foo", fb1."bar"
    from foobar fb1
    left join foobar fb2
    on fb1."group" = fb2."group" and
       fb2."timestamp" > fb1."timestamp" and
       (fb2."foo" <> fb1."foo" or fb2."foo" <> fb1."foo")
    left join foobar fb3
    on fb1."group" = fb3."group" and
       fb3."timestamp" < fb1."timestamp" and
       fb3."foo" = fb1."foo" and
       fb3."bar" = fb1."bar"
    left join foobar fb4
    on fb4."group" = fb1."group" and
       fb4."timestamp" < fb1."timestamp" and fb4."timestamp" > fb3."timestamp" and
       (fb4."foo" <> fb1."foo" or fb4."bar" <> fb1."bar")
    where fb2."group" is null and (fb3."group" is null or fb4."group" is not null)
    group by fb1."group", fb1."foo", fb1."bar"
    

    Explanation: We are searching for the fb1 records for which there is no newer fb2 records with different foo and bar combination, nor older fb3 record with the same combination, or, if such an fb3 exists, then an in-between fb4 also exists with a different combination.

    Login or Signup to reply.
  3. That’s a gaps-and-islands problem.
    You can compare each row’s (foo,bar) to previous row’s using lag((foo,bar)). The window definition lets you only check those coming from the same group, in ascending order. Demo:

    select id,"group",foo,bar,"timestamp"
    from (select *,coalesce((foo,bar)<>lag((foo,bar))over w1,true) is_diff_from_prev
          from foobar
          window w1 as (partition by "group" order by "timestamp"))_
    where is_diff_from_prev;
    
    id group foo bar timestamp
    1 1 10 20 1
    2 1 11 19 2
    3 1 10 20 3

    Row 4 is missing because its (foo,bar) wasn’t different from 3‘s. The subquery is required because you can’t call the window function directly in a where.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search