I have a table where I record events. These events will come into the system once a minute, per user – on the order of 10’s of thousands a minute. But I don’t need to save all of them.
If an event comes in within 90 seconds of the previous event, I want to update the previous row. If it’s been more than 90 seconds, I want to insert a new row.
Example table:
create table events (
id serial,
user_id int references users(id) not null,
created_at timestamp not null default now(),
updated_at timestamp not null default now(),
...some other event columns here...
);
create index idx_events_user_id_updated_at on events (user_id, updated_at desc);
And the pseudocode would be something like:
- Fetch last event for user
- If
updated_at
is within the last 90 seconds:- Update row, including
updated_at
withnow()
- Update row, including
- Otherwise:
- Insert new row
Is there a way to do this with a single Postgres statement?
I’m aware of on conflict
, but I don’t think it’ll work for this use-case. The (user_id, updated_at)
pair could be defined as a unique constraint, which could be used to trigger an on conflict
, but the timestamps are arbitrary. These events come in "every minute", but not exactly on a minute (or even exactly a minute apart, due to network latency, server latency, etc etc, hence using 90 seconds to give a 30 second buffer). Truncating the timestamps to the minute would reduce the feature’s usefulness, so I’d hate to do that just to handle upserts more cleanly.
2
Answers
I think your best option is to create a stored procedure with an upsert block, i.e.:
Start by assuming that the record exists and do an
UPDATE
. Include aWHERE
clause that checks if the record was updated less than 90 seconds ago.If nothing was updated it means that either there was no previous record or the previous record was updated more than 90 seconds ago – either way you need to do an
INSERT
From the clients perspective there will only be a single call (execute the procedure) and most of the time there will only be a single
UPDATE
on the server, however the secondINSERT
will be needed sometimes.Can be done with an UPSERT command. You need an
EXCLUSION
constraint on the timestamp range. For the leading integer columnuser_id
, the additional modulebtree_gist
must be installed. See:Notably, the same doesn’t work for timestamptz, because
timestamptz + interval
is onlySTABLE
, whiletimestamp + interval
isIMMUTABLE
as required for the implied GiST index (or any index for that matter).Query:
fiddle
This assumes at most one row per
user_id
in the input.And no concurrent, competing writes. Else, there may be race conditions due to the unavoidable time gap between
INSERT
&UPDATE
. (ON CONFLICT ... DO UPDATE
is not allowed based on an exclusion constraint.)Also, no ambiguous data types in the input, or you must add an explicit cast. See:
Depending on details of your case, there may well be faster solutions. Like, just write all incoming rows, and later consolidate in bulk operations.
While sticking with this solution, it may be faster to add a generated column for the
tsrange
and base the exclusion constraint (and its implied GiST index) on that. Bloats the table row, though. See: