skip to Main Content

Using Postgres and Postgis I was experiencing a very slow spatial join using st_equals. Out of curiosity, I tried using = instead, and the join was much faster. Using EXPLAIN, I noticed that the st_equals join uses a nested loop, while the = uses a hash join.

I am far from an expert here – what is going on? I have found the faster option here, but there could be times when I’d prefer to use st_equals AND have it be speedy…

For what it is worth, the geom columns from both tables are both gist indexed, and st_equals takes advantage of indices. Both tables have ~220K rows.

Here are the two select queries, followed by the two EXPLAIN results. (Yes, I am finding records in one table that don’t exist in the other, and yes I know there are other ways to do this but for now let’s set that aside!)

A

explain
 select p1.geom from schema1.parcel p1
   join schema2.parcel p2
    on st_equals(p2.geom, p1.geom)
  where
    p2.pid is null

B

explain
 select p1.geom from schema1.parcel p1
   join schema2.parcel p2
     on p2.geom = p1.geom
  where
    p2.pid is null

A EXPLAIN

Gather  (cost=1000.42..2819605.27 rows=808 width=296)
  Workers Planned: 2
  ->  Nested Loop  (cost=0.42..2818524.47 rows=337 width=296)
        Join Filter: st_equals(p2.geom, p1.geom)
        ->  Parallel Seq Scan on parcel p1  (cost=0.00..9390.93 rows=95393 width=296)
        ->  Index Scan using parcel_pkey on parcel p2  (cost=0.42..4.44 rows=1 width=300)
              Index Cond: (pid IS NULL)

B EXPLAIN

Gather  (cost=1004.45..10776.96 rows=229 width=296)
  Workers Planned: 2
  ->  Hash Join  (cost=4.45..9754.06 rows=95 width=296)
        Hash Cond: (p1.geom = p2.geom)
        ->  Parallel Seq Scan on parcel p1  (cost=0.00..9390.93 rows=95393 width=296)
        ->  Hash  (cost=4.44..4.44 rows=1 width=300)
              ->  Index Scan using parcel_pkey on parcel p2  (cost=0.42..4.44 rows=1 width=300)
                    Index Cond: (pid IS NULL)

(if this is better on the gis.stackexchange site, please let me know…)

2

Answers


  1. That’s easy to explain: hash and merge join are only possible if the join condition uses the equality operator (=). Since your join condition is on a function result (st_equals(p2.geom, p1.geom)), PostgreSQL cannot use either of these join algorithms and has to resort to a nested loop join. st_equals() is quite different from =, by the way.

    Login or Signup to reply.
  2. In case of join postgres switches to nested join for below reasons:

    1. One/Both of the column of joining table does not have index.
    2. If nested join provides result quicker than hash/merge join.
    3. Sometimes analysis provides wrong explain analysis if information is not updated.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search