(See these examples at https://dbfiddle.uk/00lhcAA6. I got the same results on versions 13, 14, and 15.)
Take two tables that are partitioned together and share the same conceptual partition key (created_at
).
CREATE TABLE part_main (
main_id serial,
create_time timestamptz,
main_val int,
primary key (main_id, create_time)
)
PARTITION BY RANGE (create_time);
CREATE TABLE part_other (
other_id serial,
create_time timestamptz,
main_id int,
other_val text,
primary key (other_id, create_time),
foreign key (main_id, create_time) references part_main(main_id, create_time)
)
PARTITION BY RANGE (create_time);
CREATE TABLE part_main_y2023m02 PARTITION OF part_main
FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
CREATE TABLE part_main_y2023m03 PARTITION OF part_main
FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');
CREATE TABLE part_main_y2023m04 PARTITION OF part_main
FOR VALUES FROM ('2023-04-01') TO ('2023-05-01');
CREATE TABLE part_other_y2023m02 PARTITION OF part_other
FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
CREATE TABLE part_other_y2023m03 PARTITION OF part_other
FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');
CREATE TABLE part_other_y2023m04 PARTITION OF part_other
FOR VALUES FROM ('2023-04-01') TO ('2023-05-01');
And some test data
insert into part_main (create_time, main_val) values ('2023-04-02', 10);
insert into part_other (main_id, create_time, other_val) select main_id, create_time, 'foo' from part_main;
When joining these tables by (main_id, create_time)
and filtering on create_time
, partitions are only pruned from the left table. In the plan below, see that only one part_main
partition is scanned vs. all three from part_other
.
explain analyze
select * from part_main join part_other using (main_id, create_time)
where create_time >= '2023-04-01';
Merge Join (cost=326.83..356.03 rows=50 width=52) (actual time=0.132..0.135 rows=1 loops=1)
Merge Cond: ((part_main.main_id = part_other.main_id) AND (part_main.create_time = part_other.create_time))
-> Sort (cost=61.72..63.26 rows=617 width=16) (actual time=0.083..0.083 rows=1 loops=1)
Sort Key: part_main.main_id, part_main.create_time
Sort Method: quicksort Memory: 25kB
-> Seq Scan on part_main_y2023m04 part_main (cost=0.00..33.12 rows=617 width=16) (actual time=0.012..0.013 rows=1 loops=1)
Filter: (create_time >= '2023-04-01 00:00:00+01'::timestamp with time zone)
-> Sort (cost=265.11..273.13 rows=3210 width=48) (actual time=0.044..0.045 rows=1 loops=1)
Sort Key: part_other.main_id, part_other.create_time
Sort Method: quicksort Memory: 25kB
-> Append (cost=0.00..78.15 rows=3210 width=48) (actual time=0.019..0.021 rows=1 loops=1)
-> Seq Scan on part_other_y2023m02 part_other_1 (cost=0.00..20.70 rows=1070 width=48) (actual time=0.004..0.004 rows=0 loops=1)
-> Seq Scan on part_other_y2023m03 part_other_2 (cost=0.00..20.70 rows=1070 width=48) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on part_other_y2023m04 part_other_3 (cost=0.00..20.70 rows=1070 width=48) (actual time=0.011..0.011 rows=1 loops=1)
Filtering on the left and right side created_at
columns individually produces the expected plan.
explain analyze
select * from part_main join part_other using (main_id, create_time)
where part_main.create_time >= '2023-04-01' and part_other.create_time >= '2023-04-01';
Merge Join (cost=100.23..107.60 rows=6 width=52) (actual time=0.086..0.088 rows=1 loops=1)
Merge Cond: ((part_main.main_id = part_other.main_id) AND (part_main.create_time = part_other.create_time))
-> Sort (cost=61.72..63.26 rows=617 width=16) (actual time=0.026..0.026 rows=1 loops=1)
Sort Key: part_main.main_id, part_main.create_time
Sort Method: quicksort Memory: 25kB
-> Seq Scan on part_main_y2023m04 part_main (cost=0.00..33.12 rows=617 width=16) (actual time=0.014..0.015 rows=1 loops=1)
Filter: (create_time >= '2023-04-01 00:00:00+01'::timestamp with time zone)
-> Sort (cost=38.51..39.40 rows=357 width=48) (actual time=0.056..0.056 rows=1 loops=1)
Sort Key: part_other.main_id, part_other.create_time
Sort Method: quicksort Memory: 25kB
-> Seq Scan on part_other_y2023m04 part_other (cost=0.00..23.38 rows=357 width=48) (actual time=0.011..0.011 rows=1 loops=1)
Filter: (create_time >= '2023-04-01 00:00:00+01'::timestamp with time zone)
So it looks like I have a workaround with the second query style, but I’ve got a couple of questions.
- Is there some semantic difference between the first and the second styles that I’m not aware of? I.e., is there ever a time when
where create_time >= '2023-04-01'
andwhere part_main.create_time >= '2023-04-01' and part_other.create_time >= '2023-04-01'
could produce different results? - If not, is there anything I can try to get the planner to produce the desired plan with the more succinct style?
2
Answers
There’s no logical difference between the two queries, but one could argue that there is a semantic difference. The first query says to join the two tables by main_id and create_time and filter on one of the table’s create_time column (because the reference to create_time is unqualified, PostgreSQL can choose either table). PostgreSQL performs the join for the general case of main_id and create_time matching but doesn’t take advantage of the transitive logical restrictions on which of the second table’s partitions are viable.
The second query says to join the two tables by main_id and create_time, but also explicity filters both by create_time. The end result is the same, but the instructions about how to get there are different.
There isn’t any practical way to coerce the planner to select a specific plan. As a general princple, one shouldn’t try to force the planner’s choices. The goal should be to design an environment that facilitates the planner’s ability to find an efficient, although not necessarily optimal, plan.
1)
There is no difference. The join predicate
using (main_id, create_time)
is a shorthand forAs this is an inner join from
part_main.create_time = part_other.create_time
using the rules of logic implies thatpart_main.create_time >= '2023-03-01'
is equivalent topart_other.create_time >= '2023-03-01'
So adding it it will not change the result.
2)
But there is no need to add the second condition.
As the documentation says the default for the enable_partitionwise_join is
false
which is probably your case.After setting to
SET enable_partitionwise_join = true;
you will see the execution plan selecting only the relavent partitionsSo there is no need to extend the predicate and additionaly you get with the partition-wise join a better performance (as only the corresponding partitions are joined one to one; instead of pruning partitions, appending them and joining them).
Finaly let me make a remark about your
primary key (other_id, create_time)
-apparently only the first column is aPK
, the second column is added to enable the partitioning. This opens the question if thePK
is meaningfull at all.