skip to Main Content

(See these examples at https://dbfiddle.uk/00lhcAA6. I got the same results on versions 13, 14, and 15.)

Take two tables that are partitioned together and share the same conceptual partition key (created_at).

CREATE TABLE part_main (
   main_id serial,
   create_time timestamptz,
   main_val int,
   primary key (main_id, create_time)
 )
 PARTITION BY RANGE (create_time);

CREATE TABLE part_other (
   other_id serial,
   create_time timestamptz,
   main_id int,
   other_val text,
   primary key (other_id, create_time),
   foreign key (main_id, create_time) references part_main(main_id, create_time)
 )
 PARTITION BY RANGE (create_time);

CREATE TABLE part_main_y2023m02 PARTITION OF part_main
    FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
CREATE TABLE part_main_y2023m03 PARTITION OF part_main
    FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');
CREATE TABLE part_main_y2023m04 PARTITION OF part_main
    FOR VALUES FROM ('2023-04-01') TO ('2023-05-01');

CREATE TABLE part_other_y2023m02 PARTITION OF part_other
    FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
CREATE TABLE part_other_y2023m03 PARTITION OF part_other
    FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');
CREATE TABLE part_other_y2023m04 PARTITION OF part_other
    FOR VALUES FROM ('2023-04-01') TO ('2023-05-01');

And some test data

insert into part_main (create_time, main_val) values ('2023-04-02', 10);
insert into part_other (main_id, create_time, other_val) select main_id, create_time, 'foo' from part_main;

When joining these tables by (main_id, create_time) and filtering on create_time, partitions are only pruned from the left table. In the plan below, see that only one part_main partition is scanned vs. all three from part_other.

explain analyze 
select * from part_main join part_other using (main_id, create_time)
where create_time >= '2023-04-01';
Merge Join  (cost=326.83..356.03 rows=50 width=52) (actual time=0.132..0.135 rows=1 loops=1)
  Merge Cond: ((part_main.main_id = part_other.main_id) AND (part_main.create_time = part_other.create_time))
  ->  Sort  (cost=61.72..63.26 rows=617 width=16) (actual time=0.083..0.083 rows=1 loops=1)
        Sort Key: part_main.main_id, part_main.create_time
        Sort Method: quicksort  Memory: 25kB
        ->  Seq Scan on part_main_y2023m04 part_main  (cost=0.00..33.12 rows=617 width=16) (actual time=0.012..0.013 rows=1 loops=1)
              Filter: (create_time >= '2023-04-01 00:00:00+01'::timestamp with time zone)
  ->  Sort  (cost=265.11..273.13 rows=3210 width=48) (actual time=0.044..0.045 rows=1 loops=1)
        Sort Key: part_other.main_id, part_other.create_time
        Sort Method: quicksort  Memory: 25kB
        ->  Append  (cost=0.00..78.15 rows=3210 width=48) (actual time=0.019..0.021 rows=1 loops=1)
              ->  Seq Scan on part_other_y2023m02 part_other_1  (cost=0.00..20.70 rows=1070 width=48) (actual time=0.004..0.004 rows=0 loops=1)
              ->  Seq Scan on part_other_y2023m03 part_other_2  (cost=0.00..20.70 rows=1070 width=48) (actual time=0.003..0.003 rows=0 loops=1)
              ->  Seq Scan on part_other_y2023m04 part_other_3  (cost=0.00..20.70 rows=1070 width=48) (actual time=0.011..0.011 rows=1 loops=1)

Filtering on the left and right side created_at columns individually produces the expected plan.

explain analyze 
select * from part_main join part_other using (main_id, create_time)
where part_main.create_time >= '2023-04-01' and part_other.create_time >= '2023-04-01';
Merge Join  (cost=100.23..107.60 rows=6 width=52) (actual time=0.086..0.088 rows=1 loops=1)
  Merge Cond: ((part_main.main_id = part_other.main_id) AND (part_main.create_time = part_other.create_time))
  ->  Sort  (cost=61.72..63.26 rows=617 width=16) (actual time=0.026..0.026 rows=1 loops=1)
        Sort Key: part_main.main_id, part_main.create_time
        Sort Method: quicksort  Memory: 25kB
        ->  Seq Scan on part_main_y2023m04 part_main  (cost=0.00..33.12 rows=617 width=16) (actual time=0.014..0.015 rows=1 loops=1)
              Filter: (create_time >= '2023-04-01 00:00:00+01'::timestamp with time zone)
  ->  Sort  (cost=38.51..39.40 rows=357 width=48) (actual time=0.056..0.056 rows=1 loops=1)
        Sort Key: part_other.main_id, part_other.create_time
        Sort Method: quicksort  Memory: 25kB
        ->  Seq Scan on part_other_y2023m04 part_other  (cost=0.00..23.38 rows=357 width=48) (actual time=0.011..0.011 rows=1 loops=1)
              Filter: (create_time >= '2023-04-01 00:00:00+01'::timestamp with time zone)

So it looks like I have a workaround with the second query style, but I’ve got a couple of questions.

  1. Is there some semantic difference between the first and the second styles that I’m not aware of? I.e., is there ever a time when where create_time >= '2023-04-01' and where part_main.create_time >= '2023-04-01' and part_other.create_time >= '2023-04-01' could produce different results?
  2. If not, is there anything I can try to get the planner to produce the desired plan with the more succinct style?

2

Answers


  1. There’s no logical difference between the two queries, but one could argue that there is a semantic difference. The first query says to join the two tables by main_id and create_time and filter on one of the table’s create_time column (because the reference to create_time is unqualified, PostgreSQL can choose either table). PostgreSQL performs the join for the general case of main_id and create_time matching but doesn’t take advantage of the transitive logical restrictions on which of the second table’s partitions are viable.

    The second query says to join the two tables by main_id and create_time, but also explicity filters both by create_time. The end result is the same, but the instructions about how to get there are different.

    There isn’t any practical way to coerce the planner to select a specific plan. As a general princple, one shouldn’t try to force the planner’s choices. The goal should be to design an environment that facilitates the planner’s ability to find an efficient, although not necessarily optimal, plan.

    Login or Signup to reply.
  2. 1)
    There is no difference. The join predicate using (main_id, create_time) is a shorthand for

    on part_main.main_id = part_other.main_id and  
    part_main.create_time = part_other.create_time
    

    As this is an inner join from part_main.create_time = part_other.create_timeusing the rules of logic implies that

    part_main.create_time >= '2023-03-01' is equivalent to part_other.create_time >= '2023-03-01'

    So adding it it will not change the result.

    2)
    But there is no need to add the second condition.

    As the documentation says the default for the enable_partitionwise_join is false which is probably your case.

    After setting to SET enable_partitionwise_join = true; you will see the execution plan selecting only the relavent partitions

    explain analyze 
    select * from part_main join part_other using (main_id, create_time)
    where create_time >= '2023-04-01';
    
    Merge Join  (cost=136.26..149.08 rows=17 width=52) (actual time=0.032..0.033 rows=1 loops=1)
      Merge Cond: ((part_main.main_id = part_other.main_id) AND (part_main.create_time = part_other.create_time))
      ->  Sort  (cost=61.72..63.26 rows=617 width=16) (actual time=0.012..0.013 rows=1 loops=1)
            Sort Key: part_main.main_id, part_main.create_time
            Sort Method: quicksort  Memory: 25kB
            ->  Seq Scan on part_main_y2023m04 part_main  (cost=0.00..33.12 rows=617 width=16) (actual time=0.007..0.008 rows=1 loops=1)
                  Filter: (create_time >= '2023-04-01 00:00:00+02'::timestamp with time zone)
      ->  Sort  (cost=74.54..77.21 rows=1070 width=48) (actual time=0.015..0.016 rows=1 loops=1)
            Sort Key: part_other.main_id, part_other.create_time
            Sort Method: quicksort  Memory: 25kB
            ->  Seq Scan on part_other_y2023m04 part_other  (cost=0.00..20.70 rows=1070 width=48) (actual time=0.007..0.008 rows=1 loops=1)
    Planning Time: 0.216 ms
    Execution Time: 0.052 ms 
    

    So there is no need to extend the predicate and additionaly you get with the partition-wise join a better performance (as only the corresponding partitions are joined one to one; instead of pruning partitions, appending them and joining them).

    Finaly let me make a remark about your primary key (other_id, create_time) -apparently only the first column is a PK, the second column is added to enable the partitioning. This opens the question if the PK is meaningfull at all.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search