I have a PostGIS table that stores the locations a device has passed through in the following way:
ts | lat | lon |
---|---|---|
2022/12/23 10:04 | -3.2654 | -79.9234 |
2022/12/23 10:14 | -3.2654 | -79.9234 |
2022/12/23 10:24 | -2.95592 | -79.6833 |
As the device is transmitting every 10 minutes, there are a lot of duplicated locations and I want to filter the positions of the device to get the points that are separated at least by 10km of any other point the device has passed through.
I tried this SQL, and it filters the duplicates that are the exact same point, but I want to add the distance check to the other points of the table.
SELECT MIN(ts) AS arrivalTime, MAX(ts) AS departureTime, "lat", "lon" FROM "logsData"
WHERE "lat" IS NOT NULL
GROUP BY lat, lon
ORDER BY MAX(ts);
Is there a way to do this?
Thank you in advance
2
Answers
Run a self-join on the logsData so that there is a pairwise comparison of the locations, and then add a column to store the calculated distances. You may want to eliminate the duplicate locations before the self-join.
If unable to add a column to a CTE, create a new table instead of using a CTE. And then add the column for the distance values:
Requirements
logsData
(ts
,lat
,lon
) can be expressed asgg1
,gg2
,gg3
…..ggN
. To report those data points that are 10km apart requires calculating distance as below:distance(
gg1
,gg2
) >= 10km, reportgg1
,gg2
if not, calculate (gg1, ggX) until the distance >= 10km, then report
ggX
now use
ggX
as origin and find nextggY
so that distance(ggX
,ggY
) >= 10kmRepeat until all required data points are reported.
Before some SQL gurus come out an answer, here’s a workaround.
Workaround
Calculate the distance between adjacent data points (in terms of 10-minute interval)
Calculate cumulative distance of each data point
Divide all data points into 10km segments: 0-10km, 10-20km, 20-30km….
Pick the first data point in each segment.
The output is similar to this (Note the following test data are based on 10m segment for demonstration purpose)