skip to Main Content

I have a question that I’m sure you experts can easily answer 🙂

from a table in postgresql/postgis with the following content

ID | car | timestamp | geom | startflag
903 | test1 | 03.11.23 08:38:52 | 0101000020E61000001D33F55 | 4
904 | test1 | 03.11.23 08:39:02 | 0101000020E6100000A40117B | 0
905 | test1 | 03.11.23 08:39:13 | 0101000020E6100000740179C | 0
906 | test1 | 03.11.23 08:39:23 | 0101000020E6100000E68013C | 0
907 | test1 | 03.11.23 08:39:34 | 0101000020E610000014B1886 | 0
908 | test1 | 03.11.23 08:39:44 | 0101000020E61000004309E9D | 4
909 | test1 | 03.11.23 08:39:54 | 0101000020E61000004C6B781 | 0
910 | test1 | 03.11.23 08:40:05 | 0101000020E61000002DC6658 | 0
911 | test1 | 03.11.23 08:40:15 | 0101000020E6100000A0B591A | 0
912 | test1 | 03.11.23 08:40:26 | 0101000020E61000000E1A55E | 0
913 | test1 | 03.11.23 08:40:36 | 0101000020E6100000D7DF128 | 0
914 | test1 | 03.11.23 08:40:46 | 0101000020E61000001C1E673 | 4
915 | test1 | 03.11.23 08:40:57 | 0101000020E6100000DD24068 | 0
916 | test1 | 03.11.23 08:41:07 | 0101000020E6100000CBE48A4 | 0
917 | test1 | 03.11.23 08:41:17 | 0101000020E610000093FFC9D | 0
918 | test1 | 03.11.23 08:41:28 | 0101000020E61000000B3A6B0 | 0

I need all records with startflag=0 after (and with) startflag = 4 row as a path (Postgis Linestring over geom) order by timestamp, until and without next startflag=4

From next startflag=4 then next path and so on…

I tried lead and partitioning but I don’t know much about it

2

Answers


  1. As always, there are several possible means to obtain what you want. The following SQL-statement is only one of them:

    WITH reorder AS (
      SELECT id, car, timestamp
      , row_number() OVER (PARTITION BY car ORDER BY timestamp) AS tcid
      , count(*) OVER (PARTITION BY car) AS max_tcid
      , geom, startflag 
      FROM mytable
    ), trackranges AS (
      SELECT car, tcid AS tcid_start
      , COALESCE(LEAD(tcid, 1) OVER (partition by car order by timestamp) - 1, max_tcid) AS tcid_last
      FROM reorder
      WHERE startflag = 4
    )
    SELECT car, tcid_start, ST_MakeLine(geom ORDER BY tcid) AS geom
    FROM trackranges AS tr
    JOIN reorder AS ro USING (car)
    WHERE ro.tcid BETWEEN tr.tcid_start AND tr.tcid_last
    GROUP BY car, tcid_start
    HAVING count(*) > 1
    ORDER BY car, tcid_start;
    

    Some remarks:

    • CTE "reorder" assigns sequential numbers to each car, ordered by timestamp (column "tcid"). This is actually superfluous, but using integers later on instead of timestamps is my personal preference;
    • the main point of CTE "reorder" is the addition of the total number of rows for each car (stored as "max_tcid"), to be used later in a COALESCE function;
    • CTE "trackranges" selects all the starting points (startflag = 4) for any car and uses the LEAD function to get the "tcid" of the next starting point for that car;
    • the last point for the current track however will have a tcid equal to "next starting tcid minus 1", so the LEAD result needs to be decremented (and this is far easier with a continuous number sequence in column "tcid" than with timestamps);
    • the last starting point for any car will not have a "next starting point" (the LEAD function will return NULL, from which 1 is subtracted but that still produces a NULL), in which case the previously calculated "tcid_last" comes into play;
    • CTE "trackranges" therefore yields only 3 columns: the car identification and the first and the last "tcid" to be used for each linestring;
    • the main SELECT joins both CTEs on the basis of the car identification, using the "tcid_start" column from "trackranges" to identify each separate track, adding only those points from "reorder" that have a tcid between tcid_start and tcid_last;
    • the collected data is grouped by car and track, for which the geometries are aggregated into a linestring (ordered by the tcid, which in turn was ordered by timestamp);
    • if the last record for any car was a start point for a new (incomplete) track, the resulting linestring would be invalid (containing only a single point). This is prevented by the HAVING clause, ensuring that only groups with more than 1 point are processed.
    Login or Signup to reply.
  2. As said: there are more ways to achieve the same result. This one can be a bit of a puzzle, but it works:

    WITH reorder AS (
      SELECT id, car, timestamp
      , row_number() OVER (PARTITION BY car ORDER BY timestamp) AS tcid
      , geom, startflag 
      FROM mytable
    ), tracks AS (
      SELECT car, tcid, ROW_NUMBER() OVER (PARTITION BY car ORDER BY tcid) AS track, geom
      FROM reorder
      WHERE startflag = 4
    UNION ALL
      SELECT car, tcid, tcid - ROW_NUMBER() OVER (PARTITION BY car ORDER BY tcid) AS track, geom
      FROM reorder
      WHERE startflag <> 4
    )
    SELECT car, track, ST_MakeLine(geom ORDER BY tcid) AS geom
    FROM tracks
    GROUP BY car, track
    HAVING count(*) > 1
    ORDER BY car, track;
    

    Some explanation:

    • CTE "reorder" adds a continuous number sequence to all the collected points for each car, stored as "tcid" (calculating a "tcid_last" is not necessary);
    • the first part of CTE "tracks" collects only the starting points (startflag = 4) from the previous CTE, and assigns a new sequence number (to be used as a track number) starting with 1 for every car (handled by PARTITION BY);
    • after the UNION ALL, the second part of CTE "tracks" collects all the non-starting points (startflag <> 4) and assigns a new continuous sequence number to the obtained records, starting with 1 for each car;
    • the resulting sequence number is subtracted from the old "tcid" that was obtained in CTE "reorder", resulting in a track number that corresponds to the re-assigned track numbers in the first part (prior to UNION ALL) of CTE "tracks";
    • in the main query at the end, the tracks can again be grouped by car and track, aggregating the points into a linestring, with a HAVING clause that eliminates all tracks with only 1 point.

    Why does it work?

    Let’s say we started with the following records (abbreviated column list):

     car  | tcid | flag
    -------------------
    test1 | 1    | 4
    test1 | 2    | 0
    test1 | 3    | 0
    test1 | 4    | 4
    test1 | 5    | 0
    

    Obviously there are 2 tracks for car ‘test1’, starting at tcid 1 and 4, so the first part of CTE "tracks" would result in the following table (where column "track" is filled with a new number sequence, starting with 1):

     car  | tcid | track
    -------------------
    test1 | 1    | 1
    test1 | 4    | 2
    

    The second part would (temporarily) contain the following values:

     car  | tcid | row_number | tcid - row_number AS track
    ------------------------------------------------------
    ----[tcid 1 is eliminated by the WHERE condition)----
    test1 | 2    | 1          | 1
    test1 | 3    | 2          | 1
    ----[tcid 4 is eliminated by the WHERE condition)----
    test1 | 5    | 3          | 2
    

    In this result list, the newly created row_numbers again form a series of consecutive numbers, but the original tcid’s now have ‘holes’ in their sequence (some records were filtered away, when they had a startflag = 4).

    • Had there been no ‘holes’ in any number sequence, both starting with ‘1’, the subtraction would always have been ‘0’ (a constant)
    • in parts of the table where tcid is constantly increasing with 1 for every new record, the newly created row_number is also increasing with 1, so the subtraction again yields a constant value (until a new ‘hole’ appears). In other words: tcid’s 2 and 3 are consecutive, and also the new row_numbers 1 and 2, so the subtraction has equal values;
    • there are as many ‘holes’ in the tcid-sequence as there are track starts, and with every ‘hole’ the subtraction results in a new track number (tcid – row_number) equal to the previous track number plus 1;
    • therefore the column calculated as "tcid – row_number" will result in a track number corresponding to the n-th track from the first part of the CTE (the part before UNION ALL).

    One caveat: if the first inserted record in the original table was (for whatever reason) not flagged as a starting point, this query will create a linestring (given enough points for that track) but will assign a track number 0. That could quite easily be eliminated in the main query, using WHERE track > 0.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search