I am working with a PostgreSQL database where I have a query that unions 5000 tables with identical structures. The query is saved as a view. I’ve noticed that the planning time for this query is significantly longer than the execution time—around 20 seconds for planning versus less than 1 second for execution.
Here’s a simplified example of my query:
CREATE OR REPLACE VIEW my_view AS
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
-- ... (up to 5000 tables)
;
Questions:
- Why is the planning time taking so long compared to the execution time?
- Are there any optimization techniques to reduce the planning time?
- Since the query is saved as a view, is there a way to cache or skip the planning process for subsequent runs, given that the query itself doesn’t change?
Any insights or suggestions would be greatly appreciated.
2
Answers
Analyzing the structure and coming up with a plan to read 5k tables takes some CPU even if they are simple and empty. The resulting plan is trivial and I suspect tables are a combination of simple, tiny and empty, so execution isn’t a challenge. demo1
PostgreSQL caches and considers re-using some query plans by default (
plan_cache_mode=auto
). You can try toPREPARE
a select from this view, and thenEXECUTE
it multiple times, re-using some of the work put into processing the statement – possibly even the plan: demo2. Note that this plan cache and prepared statement are owned by and exclusively available to a session, not shared, so each client would have to initially run their ownPREPARE
.In a test involving 5k empty tables on my db, planning repeatedly took
10s
for a series of repeatedselect * from my_view
that unioned them all. Once IPREPARE
d andEXECUTE
d the statement once, its subsequentEXECUTE
s took4ms
to plan.Make this view a
materialized view
to cache the result: demo3. This can be shared by multiple sessions.Remember to
refresh materialized view
whenever you want to discard the old cache. If you want it to automatically refresh on its own, you can set up (statement level) triggers on the source tables that issue arefresh materialized view concurrently
whenever there’s a change to be cascaded. There’s apg_ivm
extension that lets you make it refresh incrementally.You very probably missed the table partitioning in your desing
Example of a list partitioned table with 1001 partitions
On empty table the query is close to instant
The your desing with 1001 tables with union view leads in the same empty setup to