Postgresql - SQL Select rows between starting flag and make linestring over geometries

Frank
November 7, 2023
158 views
1 vote
2 Answers

I have a question that I’m sure you experts can easily answer 🙂

from a table in postgresql/postgis with the following content

ID | car | timestamp | geom | startflag
903 | test1 | 03.11.23 08:38:52 | 0101000020E61000001D33F55 | 4
904 | test1 | 03.11.23 08:39:02 | 0101000020E6100000A40117B | 0
905 | test1 | 03.11.23 08:39:13 | 0101000020E6100000740179C | 0
906 | test1 | 03.11.23 08:39:23 | 0101000020E6100000E68013C | 0
907 | test1 | 03.11.23 08:39:34 | 0101000020E610000014B1886 | 0
908 | test1 | 03.11.23 08:39:44 | 0101000020E61000004309E9D | 4
909 | test1 | 03.11.23 08:39:54 | 0101000020E61000004C6B781 | 0
910 | test1 | 03.11.23 08:40:05 | 0101000020E61000002DC6658 | 0
911 | test1 | 03.11.23 08:40:15 | 0101000020E6100000A0B591A | 0
912 | test1 | 03.11.23 08:40:26 | 0101000020E61000000E1A55E | 0
913 | test1 | 03.11.23 08:40:36 | 0101000020E6100000D7DF128 | 0
914 | test1 | 03.11.23 08:40:46 | 0101000020E61000001C1E673 | 4
915 | test1 | 03.11.23 08:40:57 | 0101000020E6100000DD24068 | 0
916 | test1 | 03.11.23 08:41:07 | 0101000020E6100000CBE48A4 | 0
917 | test1 | 03.11.23 08:41:17 | 0101000020E610000093FFC9D | 0
918 | test1 | 03.11.23 08:41:28 | 0101000020E61000000B3A6B0 | 0

I need all records with startflag=0 after (and with) startflag = 4 row as a path (Postgis Linestring over geom) order by timestamp, until and without next startflag=4

From next startflag=4 then next path and so on…

I tried lead and partitioning but I don’t know much about it

Tags: postgis postgresql

Answers

- PaulvanderKroft
- November 7, 2023 at 12:19 pm
- 0 votes
0
As always, there are several possible means to obtain what you want. The following SQL-statement is only one of them:
```
WITH reorder AS (
  SELECT id, car, timestamp
  , row_number() OVER (PARTITION BY car ORDER BY timestamp) AS tcid
  , count(*) OVER (PARTITION BY car) AS max_tcid
  , geom, startflag 
  FROM mytable
), trackranges AS (
  SELECT car, tcid AS tcid_start
  , COALESCE(LEAD(tcid, 1) OVER (partition by car order by timestamp) - 1, max_tcid) AS tcid_last
  FROM reorder
  WHERE startflag = 4
)
SELECT car, tcid_start, ST_MakeLine(geom ORDER BY tcid) AS geom
FROM trackranges AS tr
JOIN reorder AS ro USING (car)
WHERE ro.tcid BETWEEN tr.tcid_start AND tr.tcid_last
GROUP BY car, tcid_start
HAVING count(*) > 1
ORDER BY car, tcid_start;
```
Some remarks:
- CTE "reorder" assigns sequential numbers to each car, ordered by timestamp (column "tcid"). This is actually superfluous, but using integers later on instead of timestamps is my personal preference;
- the main point of CTE "reorder" is the addition of the total number of rows for each car (stored as "max_tcid"), to be used later in a COALESCE function;
- CTE "trackranges" selects all the starting points (startflag = 4) for any car and uses the LEAD function to get the "tcid" of the next starting point for that car;
- the last point for the current track however will have a tcid equal to "next starting tcid minus 1", so the LEAD result needs to be decremented (and this is far easier with a continuous number sequence in column "tcid" than with timestamps);
- the last starting point for any car will not have a "next starting point" (the LEAD function will return NULL, from which 1 is subtracted but that still produces a NULL), in which case the previously calculated "tcid_last" comes into play;
- CTE "trackranges" therefore yields only 3 columns: the car identification and the first and the last "tcid" to be used for each linestring;
- the main SELECT joins both CTEs on the basis of the car identification, using the "tcid_start" column from "trackranges" to identify each separate track, adding only those points from "reorder" that have a tcid between tcid_start and tcid_last;
- the collected data is grouped by car and track, for which the geometries are aggregated into a linestring (ordered by the tcid, which in turn was ordered by timestamp);
- if the last record for any car was a start point for a new (incomplete) track, the resulting linestring would be invalid (containing only a single point). This is prevented by the HAVING clause, ensuring that only groups with more than 1 point are processed.
Login or Signup to reply.

- PaulvanderKroft
- November 7, 2023 at 1:35 pm
- 0 votes
0
As said: there are more ways to achieve the same result. This one can be a bit of a puzzle, but it works:
```
WITH reorder AS (
  SELECT id, car, timestamp
  , row_number() OVER (PARTITION BY car ORDER BY timestamp) AS tcid
  , geom, startflag 
  FROM mytable
), tracks AS (
  SELECT car, tcid, ROW_NUMBER() OVER (PARTITION BY car ORDER BY tcid) AS track, geom
  FROM reorder
  WHERE startflag = 4
UNION ALL
  SELECT car, tcid, tcid - ROW_NUMBER() OVER (PARTITION BY car ORDER BY tcid) AS track, geom
  FROM reorder
  WHERE startflag <> 4
)
SELECT car, track, ST_MakeLine(geom ORDER BY tcid) AS geom
FROM tracks
GROUP BY car, track
HAVING count(*) > 1
ORDER BY car, track;
```
Some explanation:
- CTE "reorder" adds a continuous number sequence to all the collected points for each car, stored as "tcid" (calculating a "tcid_last" is not necessary);
- the first part of CTE "tracks" collects only the starting points (startflag = 4) from the previous CTE, and assigns a new sequence number (to be used as a track number) starting with 1 for every car (handled by PARTITION BY);
- after the UNION ALL, the second part of CTE "tracks" collects all the non-starting points (startflag <> 4) and assigns a new continuous sequence number to the obtained records, starting with 1 for each car;
- the resulting sequence number is subtracted from the old "tcid" that was obtained in CTE "reorder", resulting in a track number that corresponds to the re-assigned track numbers in the first part (prior to UNION ALL) of CTE "tracks";
- in the main query at the end, the tracks can again be grouped by car and track, aggregating the points into a linestring, with a HAVING clause that eliminates all tracks with only 1 point.
Why does it work?

Let’s say we started with the following records (abbreviated column list):
```
 car  | tcid | flag
-------------------
test1 | 1    | 4
test1 | 2    | 0
test1 | 3    | 0
test1 | 4    | 4
test1 | 5    | 0
```
Obviously there are 2 tracks for car ‘test1’, starting at tcid 1 and 4, so the first part of CTE "tracks" would result in the following table (where column "track" is filled with a new number sequence, starting with 1):
```
 car  | tcid | track
-------------------
test1 | 1    | 1
test1 | 4    | 2
```
The second part would (temporarily) contain the following values:
```
 car  | tcid | row_number | tcid - row_number AS track
------------------------------------------------------
----[tcid 1 is eliminated by the WHERE condition)----
test1 | 2    | 1          | 1
test1 | 3    | 2          | 1
----[tcid 4 is eliminated by the WHERE condition)----
test1 | 5    | 3          | 2
```
In this result list, the newly created row_numbers again form a series of consecutive numbers, but the original tcid’s now have ‘holes’ in their sequence (some records were filtered away, when they had a startflag = 4).
- Had there been no ‘holes’ in any number sequence, both starting with ‘1’, the subtraction would always have been ‘0’ (a constant)
- in parts of the table where tcid is constantly increasing with 1 for every new record, the newly created row_number is also increasing with 1, so the subtraction again yields a constant value (until a new ‘hole’ appears). In other words: tcid’s 2 and 3 are consecutive, and also the new row_numbers 1 and 2, so the subtraction has equal values;
- there are as many ‘holes’ in the tcid-sequence as there are track starts, and with every ‘hole’ the subtraction results in a new track number (tcid – row_number) equal to the previous track number plus 1;
- therefore the column calculated as "tcid – row_number" will result in a track number corresponding to the n-th track from the first part of the CTE (the part before UNION ALL).
One caveat: if the first inserted record in the original table was (for whatever reason) not flagged as a starting point, this query will create a linestring (given enough points for that track) but will assign a track number 0. That could quite easily be eliminated in the main query, using WHERE track > 0.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – SQL Select rows between starting flag and make linestring over geometries

Answers