Excessive Planning Time for PostgreSQL Query Involving Union of 5000 Tables

DavidHJ
October 23, 2023
177 views
1 vote
2 Answers

I am working with a PostgreSQL database where I have a query that unions 5000 tables with identical structures. The query is saved as a view. I’ve noticed that the planning time for this query is significantly longer than the execution time—around 20 seconds for planning versus less than 1 second for execution.

Here’s a simplified example of my query:

CREATE OR REPLACE VIEW my_view AS
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
-- ... (up to 5000 tables)
;

Questions:

Why is the planning time taking so long compared to the execution time?
Are there any optimization techniques to reduce the planning time?
Since the query is saved as a view, is there a way to cache or skip the planning process for subsequent runs, given that the query itself doesn’t change?

Any insights or suggestions would be greatly appreciated.

Answers

- Zegarek
- October 23, 2023 at 1:17 pm
- 0 votes
0
1. Analyzing the structure and coming up with a plan to read 5k tables takes some CPU even if they are simple and empty. The resulting plan is trivial and I suspect tables are a combination of simple, tiny and empty, so execution isn’t a challenge. demo1
2. PostgreSQL caches and considers re-using some query plans by default (plan_cache_mode=auto). You can try to PREPARE a select from this view, and then EXECUTE it multiple times, re-using some of the work put into processing the statement – possibly even the plan: demo2. Note that this plan cache and prepared statement are owned by and exclusively available to a session, not shared, so each client would have to initially run their own PREPARE.
  
  In a test involving 5k empty tables on my db, planning repeatedly took 10s for a series of repeated select * from my_view that unioned them all. Once I PREPAREd and EXECUTEd the statement once, its subsequent EXECUTEs took 4ms to plan.
3. Make this view a materialized view to cache the result: demo3. This can be shared by multiple sessions.
Remember to refresh materialized view whenever you want to discard the old cache. If you want it to automatically refresh on its own, you can set up (statement level) triggers on the source tables that issue a refresh materialized view concurrently whenever there’s a change to be cascaded. There’s a pg_ivm extension that lets you make it refresh incrementally.
Login or Signup to reply.

You very probably missed the table partitioning in your desing

Example of a list partitioned table with 1001 partitions

create table my_table  (a int, b text, c text, d text, e text, f text, g text, h text, i text, j timestamp) PARTITION BY LIST (a);

create table my_table_0 PARTITION of  my_table FOR VALUES IN (0);
...
create table my_table_1000 PARTITION of  my_table FOR VALUES IN (1000);

On empty table the query is close to instant

explain (analyze, buffers, format text)  select * from my_table;
Append  (cost=0.00..14064.05 rows=270270 width=268) (actual time=9.176..9.460 rows=0 loops=1)
  ->  Seq Scan on my_table_0 my_table_1  (cost=0.00..12.70 rows=270 width=268) (actual time=0.028..0.028 rows=0 loops=1)
...

Planning Time: 24.782 ms
Execution Time: 13.456 ms

The your desing with 1001 tables with union view leads in the same empty setup to

Planning Time: 1290.083 ms
Execution Time: 16.412 ms

Please signup or login to give your own answer.

Click here to cancel reply.