Using Postgres and Postgis I was experiencing a very slow spatial join using st_equals
. Out of curiosity, I tried using =
instead, and the join was much faster. Using EXPLAIN
, I noticed that the st_equals
join uses a nested loop, while the =
uses a hash join.
I am far from an expert here – what is going on? I have found the faster option here, but there could be times when I’d prefer to use st_equals
AND have it be speedy…
For what it is worth, the geom
columns from both tables are both gist indexed, and st_equals takes advantage of indices. Both tables have ~220K rows.
Here are the two select queries, followed by the two EXPLAIN results. (Yes, I am finding records in one table that don’t exist in the other, and yes I know there are other ways to do this but for now let’s set that aside!)
A
explain
select p1.geom from schema1.parcel p1
join schema2.parcel p2
on st_equals(p2.geom, p1.geom)
where
p2.pid is null
B
explain
select p1.geom from schema1.parcel p1
join schema2.parcel p2
on p2.geom = p1.geom
where
p2.pid is null
A EXPLAIN
Gather (cost=1000.42..2819605.27 rows=808 width=296)
Workers Planned: 2
-> Nested Loop (cost=0.42..2818524.47 rows=337 width=296)
Join Filter: st_equals(p2.geom, p1.geom)
-> Parallel Seq Scan on parcel p1 (cost=0.00..9390.93 rows=95393 width=296)
-> Index Scan using parcel_pkey on parcel p2 (cost=0.42..4.44 rows=1 width=300)
Index Cond: (pid IS NULL)
B EXPLAIN
Gather (cost=1004.45..10776.96 rows=229 width=296)
Workers Planned: 2
-> Hash Join (cost=4.45..9754.06 rows=95 width=296)
Hash Cond: (p1.geom = p2.geom)
-> Parallel Seq Scan on parcel p1 (cost=0.00..9390.93 rows=95393 width=296)
-> Hash (cost=4.44..4.44 rows=1 width=300)
-> Index Scan using parcel_pkey on parcel p2 (cost=0.42..4.44 rows=1 width=300)
Index Cond: (pid IS NULL)
(if this is better on the gis.stackexchange site, please let me know…)
2
Answers
That’s easy to explain: hash and merge join are only possible if the join condition uses the equality operator (
=
). Since your join condition is on a function result (st_equals(p2.geom, p1.geom)
), PostgreSQL cannot use either of these join algorithms and has to resort to a nested loop join.st_equals()
is quite different from=
, by the way.In case of join postgres switches to nested join for below reasons: