skip to Main Content

How does ON predicate of Postgres LATERAL JOIN work?

Let me clarify question a bit. I’ve read the official documentation and a bunch of articles about this kind of JOIN.

As far as I understood it is a foreach loop with a correlated subquery inside

  • it iterates over all records of a table A, allowing to reference columns of a "current" row in a correlated subquery B and join a result set of the B to that "current" row of A – if the B query returns 1 row there is only one pair, and if the B query return N rows there are N pairs with duplicated "current" row of the A. The same behavior like in usual JOINs.

But why is there a need in ON predicate?
For me, in usual JOINs we use ON because we have a cartesian product of 2 tables to be filtered out, and it is not the case of LATERAL JOIN, which produces resulting pairs directly.
In other words, in my developer experience I’ve only seen CROSS JOIN LATERAL and LEFT JOIN LATERAL () ON TRUE (the latter looks quite clumsy, though) but one day a colleague showed me

SELECT
r.acceptance_status, count(*) as count
FROM route r
LEFT JOIN LATERAL (
    SELECT rts.route_id, array_agg(rts.shipment_id) shipment_ids
    FROM route_to_shipment rts
    where rts.route_id = r.route_id
    GROUP BY rts.route_id
) rts using (route_id)

and this exploded my mind. Why using (route_id)? We already have where rts.route_id = r.route_id inside the subquery!

Maybe I understand the mechanics of LATERAL joins wrong?

2

Answers


  1. A variant of this question has been answered at https://dba.stackexchange.com/questions/301884/do-postgresql-lateral-joins-require-or-allow-an-on-clause.

    In short, the ON clause is a syntactic requirement for other than CROSS JOIN or NATURAL JOIN (the latter of which is an ill-conceived idea that should be expunged from SQL). For LEFT JOIN LATERAL, use ON TRUE instead of USING to avoid unnecessary dependencies on the subquery’s select list.

    Login or Signup to reply.
  2. It does make a lot of sense, even though it doesn’t look like it should. Unfortunately, it isn’t as much about the declarative meaning of either form of the statement as it is about what the Planner/Optimizer makes of it: an explicit join condition clearly communicates dependency and lets it inspect the relation between the joined tables to apply adequate optimisation techniques.

    A lateral join (...)subquery on true only means that the subquery is meant to be evaluated for each row and on true obfuscates how it depends on that row – that’s left untouched as internal logic of the lateral subquery, and lateral merely allows it to use the external reference, without communicating much more than that to the planner, so it’s left unoptimised. Ideally, the planner should peek inside and see the where, but it doesn’t (at least as of PostgreSQL 16.1). It doesn’t do that either if you move the where out to the outer query, old-implicit-join-style, although that does help it speed things up in other ways.

    When you run your friend’s query, it just makes it more obvious there’s nothing useful in the join being lateral, and further, that there’s actually no join required. You’re not requesting anything from the subquery, so it’s only providing matches for route, which could be useful to count combinations of matched rows but because you also aggregate there, it can at most provide a single match. In the end, it turns out it contributes nothing to the query, which effectively can be shortened first to this:

    SELECT
    r.acceptance_status, count(*) as count
    FROM route r
    LEFT JOIN LATERAL (
        SELECT distinct rts.route_id
        FROM route_to_shipment rts
    ) rts using (route_id)
    group by 1
    

    But since route_to_shipment.route_id that doesn’t match route.route_id is ignored thanks to the left join and left join will fetch unmatched route.route_id regardless of their presence in the subquery, it can be completely removed:

    SELECT r.acceptance_status, count(*)
    FROM route r group by 1;
    

    You’ll get the exact same plan for all three forms of the query, unsurprisingly short and quick in all cases:

    QUERY PLAN
    Sort  (cost=41.60..42.10 rows=200 width=40) (actual time=0.590..0.591 rows=5 loops=1)
      Output: r.acceptance_status, (count(*))
      Sort Key: r.acceptance_status, (count(*))
      Sort Method: quicksort  Memory: 25kB
      ->  HashAggregate  (cost=31.95..33.95 rows=200 width=40) (actual time=0.583..0.585 rows=5 loops=1)
            Output: r.acceptance_status, count(*)
            Group Key: r.acceptance_status
            Batches: 1  Memory Usage: 40kB
            ->  Seq Scan on public.route r  (cost=0.00..24.97 rows=1397 width=32) (actual time=0.007..0.158 rows=2000 loops=1)
                  Output: r.route_id, r.acceptance_status
    Planning Time: 0.056 ms
    Execution Time: 0.608 ms
    

    If you instead left lateral join...on true, you’re pretending you want the subquery to be evaluated for every row, no matter what, and you obfuscate the dependency, so the planner does literally that:

    QUERY PLAN
    Incremental Sort  (cost=475713.66..475735.06 rows=200 width=40) (actual time=3076.386..3076.388 rows=5 loops=1)
      Output: r.acceptance_status, (count(*))
      Sort Key: r.acceptance_status, (count(*))
      Presorted Key: r.acceptance_status
      Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 25kB  Peak Memory: 25kB
      ->  GroupAggregate  (cost=475713.59..475726.06 rows=200 width=40) (actual time=3076.153..3076.348 rows=5 loops=1)
            Output: r.acceptance_status, count(*)
            Group Key: r.acceptance_status
            ->  Sort  (cost=475713.59..475717.08 rows=1397 width=32) (actual time=3076.035..3076.135 rows=2000 loops=1)
                  Output: r.acceptance_status
                  Sort Key: r.acceptance_status
                  Sort Method: quicksort  Memory: 86kB
                  ->  Nested Loop Left Join  (cost=0.00..475640.60 rows=1397 width=32) (actual time=34.858..3074.645 rows=2000 loops=1)
                        Output: r.acceptance_status
                        ->  Seq Scan on public.route r  (cost=0.00..24.97 rows=1397 width=36) (actual time=0.013..0.387 rows=2000 loops=1)
                              Output: r.route_id, r.acceptance_status
                        ->  GroupAggregate  (cost=0.00..340.44 rows=1 width=36) (actual time=1.536..1.536 rows=1 loops=2000)
                              Output: rts.route_id, NULL::integer[]
                              ->  Seq Scan on public.route_to_shipment rts  (cost=0.00..340.43 rows=101 width=4) (actual time=0.166..1.532 rows=10 loops=2000)
                                    Output: rts.shipment_id, rts.route_id
                                    Filter: (rts.route_id = r.route_id)
                                    Rows Removed by Filter: 19990
    Planning Time: 0.134 ms
    JIT:
      Functions: 11
      Options: Inlining false, Optimization false, Expressions true, Deforming true
      Timing: Generation 0.896 ms, Inlining 0.000 ms, Optimization 1.520 ms, Emission 31.319 ms, Total 33.735 ms
    Execution Time: 3279.075 ms
    

    Complete demo at db<>fiddle:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search