I have the following query to display the performance of a machine in 8-hour segments (although it can change to 1 hour, 1 week, 1 month, etc.). My table has 2 million records, and the query takes 10 seconds to execute. Is this a reasonable speed, or can it be improved?
I have
- 8 GB of RAM
- Intel Haswell CPU with 4 cores
- PostgreSQL 14.2
- TimescaleDB 2.6.1
- shared_buffers = 1024MB
- temp_buffers = 16MB
- work_mem = 64MB
And the tbl_pieza
table is a hypertable.
create table tbl_pieza
(
id_nu_pieza integer not null,
id_nu_orden_fabricacion integer,
id_nu_referencia integer,
id_nu_operacion integer,
id_nu_maquina integer,
id_nu_usuario integer,
ind_paro integer,
ind_validada integer default 0,
nu_segundos integer,
dtm_inicio_at timestamp default CURRENT_TIMESTAMP not null,
dtm_fin_at timestamp,
ind_estatus integer default 1,
dtm_create_at timestamp,
dtm_update_at timestamp default CURRENT_TIMESTAMP,
ind_retrabajo integer default 0,
primary key (id_nu_pieza, dtm_inicio_at)
);
create index tbl_pieza_dtm_inicio_at_idx
on tbl_pieza (dtm_inicio_at desc);
create index idx_time_range
on tbl_pieza (dtm_inicio_at, dtm_fin_at);
WITH Rangos AS (
SELECT
generate_series(
'2023-05-22 16:23:14'::timestamp,
'2023-05-26 08:23:14'::timestamp,
'8 hour'::interval
) AS inicio,
generate_series(
'2023-05-23 00:23:14'::timestamp,
'2023-05-26 16:23:14'::timestamp,
'8 hour'::interval
) AS fin
),
PiezasPorIntervalo AS (
SELECT
r.inicio,
r.fin,
p.id_nu_operacion,
p.id_nu_maquina,
SUM(
CASE
WHEN EXTRACT(epoch FROM p.dtm_fin_at - p.dtm_inicio_at) = 0 THEN 0
ELSE GREATEST(0, EXTRACT(epoch FROM LEAST(r.fin, p.dtm_fin_at) - GREATEST(r.inicio, p.dtm_inicio_at)) / EXTRACT(epoch FROM p.dtm_fin_at - p.dtm_inicio_at))
END
) as PiezasReales
FROM Rangos r
JOIN tbl_pieza p ON p.dtm_inicio_at < r.fin AND p.dtm_fin_at > r.inicio
AND p.id_nu_usuario in (1,8,11,43,44,45,46,47,48,49)
AND p.id_nu_operacion in (84,85,86,87,88,89,90,91,92,93,118,119)
AND p.id_nu_referencia in (46,58,59,60)
AND p.id_nu_maquina in (1,2,3,8)
GROUP BY r.inicio, r.fin, p.id_nu_operacion, p.id_nu_maquina
)
SELECT
p.inicio as fecha_inicio,
p.fin as fecha_fin,
p.id_nu_maquina as id_maquina,
CASE
WHEN o.ciclo_estimado + o.tiempo_cambio_estimado = 0 THEN 0
ELSE (p.PiezasReales::decimal / (28800 / (o.ciclo_estimado + o.tiempo_cambio_estimado))) * 100
END as resultado
FROM PiezasPorIntervalo p
JOIN operacion o ON o.id_operacion = p.id_nu_operacion
ORDER BY fecha_inicio;
I’m running this on a system with the mentioned specifications. Any suggestions on how to optimize this query for better performance would be greatly appreciated. Thank you!
Output of EXPLAIN (ANALYZE, BUFFERS)
PiezasPorIntervalo is the part that takes the longest, I will explain right away what I do
Suppose we have a production table with the following entries:
Piece | Production Start | Production End
----- | ------------------------| -------------------------
A | 2023-05-23 08:00:00 | 2023-05-23 10:00:00
B | 2023-05-23 09:30:00 | 2023-05-23 12:00:00
C | 2023-05-23 10:30:00 | 2023-05-23 11:30:00
D | 2023-05-23 12:00:00 | 2023-05-23 13:30:00
And let’s say we want to calculate the number of "PiezasReales" in a specific time interval from ‘2023-05-23 09:00:00’ to ‘2023-05-23 11:00:00’. Here’s the step-by-step calculation:
Duration of Intersection between the Time Range and Piece Production Time:
For Piece A: MIN(2023-05-23 11:00:00, 2023-05-23 10:00:00) - MAX(2023-05-23 09:00:00, 2023-05-23 08:00:00) = 1 hour
For Piece B: MIN(2023-05-23 11:00:00, 2023-05-23 12:00:00) - MAX(2023-05-23 09:00:00, 2023-05-23 09:30:00) = 0.5 hours
For Piece C: MIN(2023-05-23 11:00:00, 2023-05-23 11:30:00) - MAX(2023-05-23 09:00:00, 2023-05-23 10:30:00) = 0.5 hours
For Piece D: No intersection with the interval, so the duration is 0.
Total Duration of the Piece:
For Piece A: 2023-05-23 10:00:00 - 2023-05-23 08:00:00 = 2 hours
For Piece B: 2023-05-23 12:00:00 - 2023-05-23 09:30:00 = 2.5 hours
For Piece C: 2023-05-23 11:30:00 - 2023-05-23 10:30:00 = 1 hour
For Piece D: 2023-05-23 13:30:00 - 2023-05-23 12:00:00 = 1.5 hours
Fraction of Time in the Interval:
For Piece A: 1 hour / 2 hours = 0.5
For Piece B: 0.5 hours / 2.5 hours = 0.2
For Piece C: 0.5 hours / 1 hour = 0.5
For Piece D: 0 (since there's no intersection with the interval).
Number of PiezasReales in the Interval:
Sum of the fractions of time in the interval: 0.5 + 0.2 + 0.5 + 0 = 1.2
So, in this specific time interval, there were "RealPieces" equivalent to 1.2 pieces, taking into account the fraction of time each piece was in production within the interval.
2
Answers
The first thing to do is to create a table timetable with a unique column as a PK to replace the generate series statement inside the query :
The second thing to do is to replace your IN operators by a serie of temporary tables with an unique colonne that is a PK
The last thing to do is to use INCLUDE indexes instead of classical ones to have covering indexes…
All of your time (and indeed, more than all of your time, I don’t know how that happens) seem to be going to JIT compilation. And you probably get no benefit from it. Turn off JIT. I think it was a mistake for the developers to turn JIT on by default, as it seems to hurt more people than it helps. So just turn it off and be done with it.
If it is still too slow, post a new plan. The effects of JIT tend to show up in random places that make no sense, making the current JIT-polluted plan hard to interpret.