Postgresql - When is it better to use CTE or temp table postgres

BobMarley
September 9, 2022
73 views
1 vote
2 Answers

I am doing a query on a very large data set and i am using WITH (CTE) syntax.. this seems to take a while and i was reading online that temp tables could be faster to use in these cases can someone advise me in which direction to go. In the CTE we join to a lot of tables then we filter on the CTE result..

Only interesting in postgres answers

Tags: postgresql

Answers

- Shameel
- September 9, 2022 at 7:44 pm
- 0 votes
0
What version of PostgreSQL are you using? CTEs perform differently in PostgreSQL versions 11 and older than versions 12 and above.

In PostgreSQL 11 and older, CTEs are optimization fences (outer query restrictions are not passed on to CTEs) and the database evaluates the query inside the CTE and caches the results (i.e., materialized results) and outer WHERE clauses are applied later when the outer query is processed, which means either a full table scan or a full index seek is performed and results in horrible performance for large tables. To avoid this, apply as much filters in the WHERE clause inside the CTE:
```
WITH UserRecord AS (SELECT * FROM Users WHERE Id = 100)
SELECT * FROM UserRecord;
```
PostgreSQL 12 addresses this problem by introducing query optimizer hints to enable us to control if the CTE should be materialized or not: MATERIALIZED, NOT MATERIALIZED.
```
WITH AllUsers AS NOT MATERIALIZED (SELECT * FROM Users)
SELECT * FROM AllUsers WHERE Id = 100;
```
Note: Text and code examples are taken from my book Migrating your SQL Server Workloads to PostgreSQL

Summary:
PostgreSQL 11 and older: Use Subquery

PostgreSQL 12 and above: Use CTE with NOT MATERIALIZED clause
Login or Signup to reply.

- Hambone
- September 12, 2022 at 6:15 pm
- 0 votes
0
My follow up comment is more than I can fit in a comment… so understand this may not be an answer to the OP per se.

Take the following query, which uses a CTE:
```
with sales as (
  select item, sum (qty) as sales_qty, sum (revenue) as sales_revenue
  from sales_data
  where country = 'USA'
  group by item
),
inventory as (
  select item, sum (on_hand_qty) as inventory_qty
  from inventory_data
  where country = 'USA' and on_hand_qty != 0
  group by item
)
select
  a.item, a.description, s.sales_qty, s.sales_revenue,
  i.inventory_qty, i.inventory_qty * a.cost as inventory_cost
from
  all_items a
  left join sales s on
    a.item = s.item
  left join inventory i on
    a.item = i.item
```
There are times where I cannot explain why that the query runs slower than I would expect. Some times, simply materializing the CTEs makes it run better, as expected. Other times it does not, but when I do this:
```
drop table if exists sales;
drop table if exists inventory;

create temporary table sales as
  select item, sum (qty) as sales_qty, sum (revenue) as sales_revenue
  from sales_data
  where country = 'USA'
  group by item;

create temporary table inventory as
  select item, sum (on_hand_qty) as inventory_qty
  from inventory_data
  where country = 'USA' and on_hand_qty != 0
  group by item;

select
  a.item, a.description, s.sales_qty, s.sales_revenue,
  i.inventory_qty, i.inventory_qty * a.cost as inventory_cost
from
  all_items a
  left join sales s on
    a.item = s.item
  left join inventory i on
    a.item = i.item;
```
Suddenly all is right in the world.

Temp tables may persist across sessions, but to my knowledge the data in them will be session-based. I’m honestly not even sure if the structures persist, which is why to be safe I always drop:
```
drop table if exists sales;
```
And use "if exists" to avoid any errors about the object not existing.

I rarely use these in common queries for the simple reason that they are not as portable as a simple SQL statement (you can’t give the final query to another user without having the temp tables). My most common use case is when I am processing within a procedure/function:
```
create procedure sales_and_inventory()
language plpgsql
as
$BODY$
  BEGIN
    create temp table sales...
    
    insert into sales_inventory
    select ...
    
    drop table sales;
  END;  
$BODY$
```
Hopefully this helps.

Also, to answer your question on indexes… typically I don’t, but nothing says that’s always the right answer. If I put data into a temp table, I assume I’m going to use all or most of it. That said, if you plan to query it multiple times with conditions where an index makes sense, then by all means do it.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – When is it better to use CTE or temp table postgres

Answers