PostGIS table with large geometry column performance - Postgresql

kolisko
February 12, 2023
311 views
2 votes
3 Answers

The routed_way table has a column named way where the path of a vehicle is stored. The length of these paths can vary greatly, with some being short and others containing millions of points. Although the table only has thousands of rows, there are two issues that need to be addressed.

Deleting rows from this table is very slow and can take several minutes.
Queries using the "st_intersects" function are also slow and take nearly one minute to complete.

create table public.routed_way
(
    id              uuid default gen_random_uuid() not null primary key,
    created_at      timestamp with time zone default CURRENT_TIMESTAMP,
    updated_at      timestamp with time zone,
    sourcer_class   text not null,
    sourcer_id      text  not null,
    route_start     timestamp with time zone,
    route_end       timestamp with time zone not null,
    way             geometry(LineString, 4326) not null,
    time_delta      integer[],
    elevation_delta integer[]
);
alter table public.routed_way owner to postgres;
create index idx_routed_way_created_at    on public.routed_way (created_at);
create index idx_routed_way_sourcer_class on public.routed_way (sourcer_class);
create index idx_routed_way_route_start   on public.routed_way (route_start);
create index idx_routed_way_way           on public.routed_way using gist (way);

select 
    id, 
    route_start, 
    route_end, 
    st_transform(way, 3857) as geometry
from routed_way
where 
    (route_start > now() - interval '7 day' 
    and route_start < now() - interval '3 day')
  and st_intersects(st_transform(way, 3857),!bbox!)

Gather  (cost=1072.66..63827.33 rows=1 width=64)
  Workers Planned: 1
  ->  Parallel Bitmap Heap Scan on routed_way  (cost=72.66..62827.23 rows=1 width=64)
        Recheck Cond: ((route_start > (now() - '7 days'::interval)) AND (route_start < (now() - '3 days'::interval)))
"        Filter: st_intersects(st_transform(way, 3857), '0103000020110F00000100000005000000010000E0FFFFEFC7010000E0FFFFEFC7010000E0FFFFEFC7010000E0FFFFEF47010000E0FFFFEF47010000E0FFFFEF47010000E0FFFFEF47010000E0FFFFEFC7010000E0FFFFEFC7010000E0FFFFEFC7'::geometry)"
        ->  Bitmap Index Scan on idx_routed_way_route_start  (cost=0.00..72.66 rows=2037 width=0)
              Index Cond: ((route_start > (now() - '7 days'::interval)) AND (route_start < (now() - '3 days'::interval)))

I am unsure what values Mapnik is using for the !bbox! so I attempted to use some values in st_makeenvelope(), but according to the EXPLAIN output, it seems that the desired index was not applied.

Explain analyze verbose select
    id,
    route_start,
    route_end,
    st_transform(way, 3857) as geometry
from routed_way
where
        (route_start > now() - interval '7 day'
  and route_start < now() - interval '3 day')
  and st_intersects(way,ST_MakeEnvelope(-180, -90, 180, 90, 4326));

Bitmap Heap Scan on public.routed_way  (cost=35.31..93324.71 rows=1856 width=64) (actual time=5.529..7573.988 rows=1848 loops=1)
"  Output: id, route_start, route_end, st_transform(way, 3857)"
  Recheck Cond: ((routed_way.route_start > (now() - '7 days'::interval)) AND (routed_way.route_start < (now() - '3 days'::interval)))
"  Filter: st_intersects(routed_way.way, '0103000020E6100000010000000500000000000000008066C000000000008056C000000000008066C0000000000080564000000000008066400000000000805640000000000080664000000000008056C000000000008066C000000000008056C0'::geometry)"
  Heap Blocks: exact=345
  ->  Bitmap Index Scan on idx_routed_way_route_start  (cost=0.00..34.85 rows=1856 width=0) (actual time=0.091..0.092 rows=1848 loops=1)
        Index Cond: ((routed_way.route_start > (now() - '7 days'::interval)) AND (routed_way.route_start < (now() - '3 days'::interval)))
Planning Time: 0.143 ms
Execution Time: 7574.252 ms

postgres=> d+ routed_way
                                                   Table "public.routed_way"
     Column      |           Type            | Collation | Nullable |      Default      | Storage  | Stats target | Description 
-----------------+---------------------------+-----------+----------+-------------------+----------+--------------+-------------
 id              | uuid                      |           | not null | gen_random_uuid() | plain    |              | 
 created_at      | timestamp with time zone  |           |          | CURRENT_TIMESTAMP | plain    |              | 
 updated_at      | timestamp with time zone  |           |          |                   | plain    |              | 
 sourcer_class   | text                      |           | not null |                   | extended |              | 
 sourcer_id      | text                      |           | not null |                   | extended |              | 
 route_start     | timestamp with time zone  |           |          |                   | plain    |              | 
 route_end       | timestamp with time zone  |           | not null |                   | plain    |              | 
 way             | geometry(LineString,4326) |           | not null |                   | external |              | 
 time_delta      | integer[]                 |           |          |                   | extended |              | 
 elevation_delta | integer[]                 |           |          |                   | extended |              | 
Indexes:
    "routed_way_pkey" PRIMARY KEY, btree (id)
    "idx_routed_way_created_at" btree (created_at)
    "idx_routed_way_route_start" btree (route_start)
    "idx_routed_way_sourcer_class" btree (sourcer_class)
    "idx_routed_way_way" gist (way) CLUSTER
Referenced by:
    TABLE "routed_point" CONSTRAINT "fk_routed_point_way_id" FOREIGN KEY (way_id) REFERENCES routed_way(id) ON UPDATE CASCADE ON DELETE CASCADE
Access method: heap

Answers

- Zegarek
- February 12, 2023 at 1:01 pm
- 0 votes
0
According to what you initially presented, you have no index on the geometry type column way. That’s causing the heap scan.
1. You need to apply a GiST index on the column to speed up all PostGIS ST_ operations on it.
2. Speed things up further by skipping the TOAST mechanism – set storage external.
3. cluster using that index
4. Make sure the table statistics are up to date with analyze
5. If the traffic on your table isn’t high, you can increase the fillfactor of your index.
```
alter table public.routed_way
   alter column way
   set storage external;
create index idx_routed_way_gist_way 
   on public.routed_way 
   using gist(way) 
--Default is 90. 100 is for static, read-only.
   with (fillfactor=100); 
cluster verbose public.routed_way using idx_routed_way_gist_way;
analyze public.routed_way;
```
You are also using ST_Transform on the whole way column instead of just the !bbox! geometry you’re intersecting with, which seems counter-productive. If you reverse that, it will have to be done just once per statement instead of once per target table record:
```
select 
    id, 
    route_start, 
    route_end, 
    st_transform(way, 3857) as geometry
from routed_way
where 
    (route_start > now() - interval '7 day' 
    and route_start < now() - interval '3 day')
  and st_intersects(way,st_transform(!bbox!,4326));
```
If your delete statements were using a similar, geometry-based where clause, you can speed them up the same way.

In principle, the larger your geometries get the less useful your index will be: if the bbox of your geometry spans the entire target area, it’s assumed to be spatially related to everything. You can try and use ST_Segmentize() to cut it up into smaller segments with smaller bounding boxes, which will speed everything up dramatically, similar to how ST_Subdivide() is used (on all the Things).

In an extreme case, if the !bbox! geometry you’re comparing to is large enough, you could consider st_subdividing it it into a temp table, indexing it, then joining with that table on st_intersects(way_segmentized,bbox_subdivided).

Subdivision and segmentation of your geometries also speed things up not just because it’s easier to compare smaller objects (same way it’s easier to process shorter texts and smaller numbers) and easier to discard and ignore ones that are far enough to be dismissed (r-tree index), but this also decreases the need/chance of TOASTing them out of their page.

Except set storage external and making sure your stats are up to date (other indexes you use benefit from that as well), none of the above will improve the performance if the query targets everything in the table. Rearranging how your data is stored, accessed and processed can help selective queries get their target records faster, but it doesn’t affect your underlying storage medium read speed that will dictate your response time if you ask the db to read the whole thing.

Any sufficiently non-specific filter, in your case a large !bbox! geometry, like the global ST_MakeEnvelope(-180,-90,180,90,4326) you mentioned, will ignore your spatial indexes because it’s immediately obvious they match everything in them, so they might just as well not be considered at all.
Login or Signup to reply.

- JGH
- February 13, 2023 at 3:53 pm
- 0 votes
0
In addition to @Zegarek answer, if you can’t control the bounding box intersection clause because it is automatically added by your map server, you can also create an index on the transformed ways:
```
create index idx_routed_way_way_3857
    on public.routed_way using gist (st_transform(way,3857));
```
Login or Signup to reply.

- mikkapy
- February 13, 2023 at 10:10 pm
- 0 votes
0
Having so large geometry in one record is almost always a bad choice.

Try splitting large geometry (e.g. < 1000 points and < 500 m length) and you will see much more predictable and better performance.

See example of using ST_Segmentize() and ST_Subdivide() here – https://postgis.net/docs/ST_Subdivide.html
```
SELECT ST_Subdivide( 
         ST_Segmentize( 'LINESTRING(0 0, 85 85)'::geography
                      , 1200000 -- max segment length, m
                      ) ::geometry
         , 8 -- max vertices
        );
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

PostGIS table with large geometry column performance – Postgresql

Answers