I have the following table:
CREATE TABLE trajectory(
user_id int,
session_id int,
timestamp timestamp with time zone,
lat double precision,
lon double precision
);
INSERT INTO trajectory(user_id, session_id, timestamp, lat, lon) VALUES
(1, 25304,'2008-10-23 02:53:04+01', 39.984702, 116.318417),
(1, 25304, '2008-10-23 02:53:10+01', 39.984683, 116.31845),
(1, 25304, '2008-10-23 02:53:15+01', 39.984686, 116.318417),
(1, 25304, '2008-10-23 02:53:20+01', 39.984688, 116.318385),
(1, 20959,'2008-10-24 02:09:59+01', 40.008304, 116.319876),
(1, 20959,'2008-10-24 02:10:04+01', 40.008413, 116.319962),
(1, 20959,'2008-10-24 02:10:14+01', 40.007171, 116.319458),
(2, 55305, '2008-10-23 05:53:05+01', 39.984094, 116.319236),
(2, 55305, '2008-10-23 05:53:11+01', 39.984198, 116.319322),
(2, 55305, '2008-10-23 05:53:21+01', 39.984224, 116.319402),
(2, 34104, '2008-10-23 23:41:04+01', 40.013867, 116.306473),
(2, 34104, '2008-10-23 23:41:16+01', 40.013907, 116.306488);
Question:
I want to add a column trip_id
to this table, based on the user_id
and session_id
columns, so that when the user’s session id changes, I know the user is on a new trip so I add that id to the new trip
column.
Required output:
user_id |session_id |timestamp | lat | lon | trip_id
--------|------------|-----------------------|--------------|-----------|-----------
1 | 25304 |2008-10-23 02:53:04+01 | 39.984702 |116.318417 | 1
1 | 25304 |2008-10-23 02:53:10+01 | 39.984683 |116.31845 | 1
1 | 25304 |2008-10-23 02:53:15+01 | 39.984686 |116.318417 | 1
1 | 25304 |2008-10-23 02:53:20+01 | 39.984688 |116.318385 | 1
1 | 20959 |2008-10-24 02:09:59+01 |40.008304 |116.319876 | 2
1 | 20959 |2008-10-24 02:10:04+01 |40.008413 |116.319962 | 2
1 | 20959 |2008-10-24 02:10:14+01 |40.007171 |116.319458 | 2
2 | 55305 |2008-10-23 05:53:05+01 |39.984094 |116.319236 | 1
2 | 55305 |2008-10-23 05:53:11+01 |39.984198 |116.319322 | 1
2 | 55305 |2008-10-23 05:53:21+01 |39.984224 |116.319402 | 1
2 | 34104 |2008-10-23 23:41:04+01 |40.013867 |116.306473 | 2
2 | 34104 |2008-10-23 23:41:16+01 |40.013907 |116.306488 | 2
How can I do this?
EDIT
Thanks for these great answers, but all the answers received are kind of retrieving table values, they do not modify the table. Plus, I added the timestamp
column that I initially omitted thinking that would simplify my question.
3
Answers
We can approach this with a gaps-and-island technique. Assuming that you have a column to order your dataset, say
ordering_id
:The idea is to bring the "preceding" session id of the same user with
lag
, then to compare it with the value on the current row; we can then count how many times it changed, which defines the trip id.fiddle
Yet another approach would involve:
DENSE RANK
by ordering on it.If you don’t have any ordering field, you can use "ctid" for that purpose: it’s a field that stores the physical location of each row within the table. Otherwise use the timestamp field in place of "ctid".
Output:
Check the demo here.
This can be accomplished by using the window function
lag()
to retrieve the previous row and determine whether or not it has changed, followed by the window functionsum()
to retrieve thetrip_id
:Assuming the user_id/timestamp is unique, the update can be :
Result :
Demo here