The routed_way
table has a column named way
where the path of a vehicle is stored. The length of these paths can vary greatly, with some being short and others containing millions of points. Although the table only has thousands of rows, there are two issues that need to be addressed.
- Deleting rows from this table is very slow and can take several minutes.
- Queries using the "st_intersects" function are also slow and take nearly one minute to complete.
create table public.routed_way
(
id uuid default gen_random_uuid() not null primary key,
created_at timestamp with time zone default CURRENT_TIMESTAMP,
updated_at timestamp with time zone,
sourcer_class text not null,
sourcer_id text not null,
route_start timestamp with time zone,
route_end timestamp with time zone not null,
way geometry(LineString, 4326) not null,
time_delta integer[],
elevation_delta integer[]
);
alter table public.routed_way owner to postgres;
create index idx_routed_way_created_at on public.routed_way (created_at);
create index idx_routed_way_sourcer_class on public.routed_way (sourcer_class);
create index idx_routed_way_route_start on public.routed_way (route_start);
create index idx_routed_way_way on public.routed_way using gist (way);
select
id,
route_start,
route_end,
st_transform(way, 3857) as geometry
from routed_way
where
(route_start > now() - interval '7 day'
and route_start < now() - interval '3 day')
and st_intersects(st_transform(way, 3857),!bbox!)
Gather (cost=1072.66..63827.33 rows=1 width=64)
Workers Planned: 1
-> Parallel Bitmap Heap Scan on routed_way (cost=72.66..62827.23 rows=1 width=64)
Recheck Cond: ((route_start > (now() - '7 days'::interval)) AND (route_start < (now() - '3 days'::interval)))
" Filter: st_intersects(st_transform(way, 3857), '0103000020110F00000100000005000000010000E0FFFFEFC7010000E0FFFFEFC7010000E0FFFFEFC7010000E0FFFFEF47010000E0FFFFEF47010000E0FFFFEF47010000E0FFFFEF47010000E0FFFFEFC7010000E0FFFFEFC7010000E0FFFFEFC7'::geometry)"
-> Bitmap Index Scan on idx_routed_way_route_start (cost=0.00..72.66 rows=2037 width=0)
Index Cond: ((route_start > (now() - '7 days'::interval)) AND (route_start < (now() - '3 days'::interval)))
I am unsure what values Mapnik is using for the !bbox!
so I attempted to use some values in st_makeenvelope()
, but according to the EXPLAIN
output, it seems that the desired index was not applied.
Explain analyze verbose select
id,
route_start,
route_end,
st_transform(way, 3857) as geometry
from routed_way
where
(route_start > now() - interval '7 day'
and route_start < now() - interval '3 day')
and st_intersects(way,ST_MakeEnvelope(-180, -90, 180, 90, 4326));
Bitmap Heap Scan on public.routed_way (cost=35.31..93324.71 rows=1856 width=64) (actual time=5.529..7573.988 rows=1848 loops=1)
" Output: id, route_start, route_end, st_transform(way, 3857)"
Recheck Cond: ((routed_way.route_start > (now() - '7 days'::interval)) AND (routed_way.route_start < (now() - '3 days'::interval)))
" Filter: st_intersects(routed_way.way, '0103000020E6100000010000000500000000000000008066C000000000008056C000000000008066C0000000000080564000000000008066400000000000805640000000000080664000000000008056C000000000008066C000000000008056C0'::geometry)"
Heap Blocks: exact=345
-> Bitmap Index Scan on idx_routed_way_route_start (cost=0.00..34.85 rows=1856 width=0) (actual time=0.091..0.092 rows=1848 loops=1)
Index Cond: ((routed_way.route_start > (now() - '7 days'::interval)) AND (routed_way.route_start < (now() - '3 days'::interval)))
Planning Time: 0.143 ms
Execution Time: 7574.252 ms
postgres=> d+ routed_way
Table "public.routed_way"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------+---------------------------+-----------+----------+-------------------+----------+--------------+-------------
id | uuid | | not null | gen_random_uuid() | plain | |
created_at | timestamp with time zone | | | CURRENT_TIMESTAMP | plain | |
updated_at | timestamp with time zone | | | | plain | |
sourcer_class | text | | not null | | extended | |
sourcer_id | text | | not null | | extended | |
route_start | timestamp with time zone | | | | plain | |
route_end | timestamp with time zone | | not null | | plain | |
way | geometry(LineString,4326) | | not null | | external | |
time_delta | integer[] | | | | extended | |
elevation_delta | integer[] | | | | extended | |
Indexes:
"routed_way_pkey" PRIMARY KEY, btree (id)
"idx_routed_way_created_at" btree (created_at)
"idx_routed_way_route_start" btree (route_start)
"idx_routed_way_sourcer_class" btree (sourcer_class)
"idx_routed_way_way" gist (way) CLUSTER
Referenced by:
TABLE "routed_point" CONSTRAINT "fk_routed_point_way_id" FOREIGN KEY (way_id) REFERENCES routed_way(id) ON UPDATE CASCADE ON DELETE CASCADE
Access method: heap
3
Answers
According to what you initially presented, you have no index on the
geometry
type columnway
. That’s causing the heap scan.ST_
operations on it.set storage external
.cluster
using that indexanalyze
fillfactor
of your index.You are also using
ST_Transform
on the wholeway
column instead of just the!bbox!
geometry you’re intersecting with, which seems counter-productive. If you reverse that, it will have to be done just once per statement instead of once per target table record:If your delete statements were using a similar, geometry-based
where
clause, you can speed them up the same way.In principle, the larger your geometries get the less useful your index will be: if the bbox of your geometry spans the entire target area, it’s assumed to be spatially related to everything. You can try and use
ST_Segmentize()
to cut it up into smaller segments with smaller bounding boxes, which will speed everything up dramatically, similar to howST_Subdivide()
is used (on all the Things).In an extreme case, if the
!bbox!
geometry you’re comparing to is large enough, you could considerst_subdividing
it it into a temp table, indexing it, then joining with that table onst_intersects(way_segmentized,bbox_subdivided)
.Subdivision and segmentation of your geometries also speed things up not just because it’s easier to compare smaller objects (same way it’s easier to process shorter texts and smaller numbers) and easier to discard and ignore ones that are far enough to be dismissed (r-tree index), but this also decreases the need/chance of TOASTing them out of their page.
Except
set storage external
and making sure your stats are up to date (other indexes you use benefit from that as well), none of the above will improve the performance if the query targets everything in the table. Rearranging how your data is stored, accessed and processed can help selective queries get their target records faster, but it doesn’t affect your underlying storage medium read speed that will dictate your response time if you ask the db to read the whole thing.Any sufficiently non-specific filter, in your case a large
!bbox!
geometry, like the globalST_MakeEnvelope(-180,-90,180,90,4326)
you mentioned, will ignore your spatial indexes because it’s immediately obvious they match everything in them, so they might just as well not be considered at all.In addition to @Zegarek answer, if you can’t control the bounding box intersection clause because it is automatically added by your map server, you can also create an index on the transformed ways:
Having so large geometry in one record is almost always a bad choice.
Try splitting large geometry (e.g. < 1000 points and < 500 m length) and you will see much more predictable and better performance.
See example of using
ST_Segmentize()
andST_Subdivide()
here – https://postgis.net/docs/ST_Subdivide.html