I have a Postgres table foobar
with the following schema:
id: integer
group: integer
foo: integer
bar: integer
timestamp: integer
I keep track of updates to various groups, and those updates mutate the foo
and bar
properties. Every time I receive an update, I store the timestamp.
Here’s an example value I could have in the databse:
+------+---------+-------+-------+-------------+
| "id" | "group" | "foo" | "bar" | "timestamp" |
+------+---------+-------+-------+-------------+
| 1 | 1 | 10 | 20 | 1 |
| 2 | 1 | 11 | 19 | 2 |
| 3 | 1 | 10 | 20 | 3 |
| 4 | 1 | 10 | 20 | 4 |
+------+---------+-------+-------+-------------+
Oftentimes, the updates I receive are identical. A particularly critical piece of information I’m trying to extract is when I first received the combination of values that is current — but not for the first time ever, but rather the first update after which there haven’t been any changes.
A naïve approach would be the following query:
SELECT DISTINCT ON ("group", foo, bar) *
FROM foobar
ORDER BY "group", foo, bar, timestamp DESC;
However, that query would return the last row, which has the latest timestamp. If I switch timestamp to ASC
, I would get the very first row, because I have seen the exact foo/bar value combination prior to the update at timestamp 2.
The intuitive thing would have been to simply move the timestamp DESC
sort command prior to foo
, but Postgres does not allow that. Something like MySQL’s HAVING
operation could also have come in handy, but Postgres unfortunately doesn’t support that.
An incredibly inefficient approach I could take is programmatically iterate through each group, get the latest row, and then fetch all rows in descending timestamp order and stop as soon as I observe a change, but it seems that letting a database do this sort of operation would be wiser.
I am quite certain that I’m missing something obvious, but would greatly appreciate any help. Thanks!
3
Answers
You may resort to the old-school way of doing ‘distinct on’ selection using
row_number
window function:Demo
A way to solve your issue would be this:
Explanation: We are searching for the
fb1
records for which there is no newerfb2
records with differentfoo
andbar
combination, nor olderfb3
record with the same combination, or, if such anfb3
exists, then an in-betweenfb4
also exists with a different combination.That’s a gaps-and-islands problem.
You can compare each row’s
(foo,bar)
to previous row’s usinglag((foo,bar))
. Thewindow
definition lets you only check those coming from the same group, in ascending order. Demo:Row
4
is missing because its(foo,bar)
wasn’t different from3
‘s. The subquery is required because you can’t call the window function directly in awhere
.