skip to Main Content

I have the following query to display the performance of a machine in 8-hour segments (although it can change to 1 hour, 1 week, 1 month, etc.). My table has 2 million records, and the query takes 10 seconds to execute. Is this a reasonable speed, or can it be improved?

I have

  • 8 GB of RAM
  • Intel Haswell CPU with 4 cores
  • PostgreSQL 14.2
  • TimescaleDB 2.6.1
  • shared_buffers = 1024MB
  • temp_buffers = 16MB
  • work_mem = 64MB

And the tbl_pieza table is a hypertable.

create table tbl_pieza
(
    id_nu_pieza             integer                             not null,
    id_nu_orden_fabricacion integer,
    id_nu_referencia        integer,
    id_nu_operacion         integer,
    id_nu_maquina           integer,
    id_nu_usuario           integer,
    ind_paro                integer,
    ind_validada            integer   default 0,
    nu_segundos             integer,
    dtm_inicio_at           timestamp default CURRENT_TIMESTAMP not null,
    dtm_fin_at              timestamp,
    ind_estatus             integer   default 1,
    dtm_create_at           timestamp,
    dtm_update_at           timestamp default CURRENT_TIMESTAMP,
    ind_retrabajo           integer   default 0,
    primary key (id_nu_pieza, dtm_inicio_at)
);

create index tbl_pieza_dtm_inicio_at_idx
    on tbl_pieza (dtm_inicio_at desc);

create index idx_time_range
    on tbl_pieza (dtm_inicio_at, dtm_fin_at);

WITH Rangos AS (
    SELECT
        generate_series(
            '2023-05-22 16:23:14'::timestamp,
            '2023-05-26 08:23:14'::timestamp,
            '8 hour'::interval
        ) AS inicio,
        generate_series(
            '2023-05-23 00:23:14'::timestamp,
            '2023-05-26 16:23:14'::timestamp,
            '8 hour'::interval
        ) AS fin
),
PiezasPorIntervalo AS (
    SELECT
        r.inicio,
        r.fin,
        p.id_nu_operacion,
        p.id_nu_maquina,
        SUM(
            CASE
                WHEN EXTRACT(epoch FROM p.dtm_fin_at - p.dtm_inicio_at) = 0 THEN 0
                ELSE GREATEST(0, EXTRACT(epoch FROM LEAST(r.fin, p.dtm_fin_at) - GREATEST(r.inicio, p.dtm_inicio_at)) / EXTRACT(epoch FROM p.dtm_fin_at - p.dtm_inicio_at))
            END
        ) as PiezasReales
    FROM Rangos r
    JOIN tbl_pieza p ON p.dtm_inicio_at < r.fin AND p.dtm_fin_at > r.inicio
                            AND p.id_nu_usuario in (1,8,11,43,44,45,46,47,48,49)
                            AND p.id_nu_operacion in (84,85,86,87,88,89,90,91,92,93,118,119)
                            AND p.id_nu_referencia in (46,58,59,60)
                            AND p.id_nu_maquina in (1,2,3,8)
    GROUP BY r.inicio, r.fin, p.id_nu_operacion, p.id_nu_maquina
)
SELECT
    p.inicio as fecha_inicio,
    p.fin as fecha_fin,
    p.id_nu_maquina as id_maquina,
    CASE
        WHEN o.ciclo_estimado + o.tiempo_cambio_estimado = 0 THEN 0
        ELSE (p.PiezasReales::decimal / (28800 / (o.ciclo_estimado + o.tiempo_cambio_estimado))) * 100
    END as resultado
FROM PiezasPorIntervalo p
JOIN operacion o ON o.id_operacion = p.id_nu_operacion
ORDER BY fecha_inicio;

I’m running this on a system with the mentioned specifications. Any suggestions on how to optimize this query for better performance would be greatly appreciated. Thank you!

Output of EXPLAIN (ANALYZE, BUFFERS)

Explain output

PiezasPorIntervalo is the part that takes the longest, I will explain right away what I do

Suppose we have a production table with the following entries:

Piece | Production Start         | Production End
----- | ------------------------| -------------------------
A     | 2023-05-23 08:00:00     | 2023-05-23 10:00:00
B     | 2023-05-23 09:30:00     | 2023-05-23 12:00:00
C     | 2023-05-23 10:30:00     | 2023-05-23 11:30:00
D     | 2023-05-23 12:00:00     | 2023-05-23 13:30:00

And let’s say we want to calculate the number of "PiezasReales" in a specific time interval from ‘2023-05-23 09:00:00’ to ‘2023-05-23 11:00:00’. Here’s the step-by-step calculation:

Duration of Intersection between the Time Range and Piece Production Time:

For Piece A: MIN(2023-05-23 11:00:00, 2023-05-23 10:00:00) - MAX(2023-05-23 09:00:00, 2023-05-23 08:00:00) = 1 hour
For Piece B: MIN(2023-05-23 11:00:00, 2023-05-23 12:00:00) - MAX(2023-05-23 09:00:00, 2023-05-23 09:30:00) = 0.5 hours
For Piece C: MIN(2023-05-23 11:00:00, 2023-05-23 11:30:00) - MAX(2023-05-23 09:00:00, 2023-05-23 10:30:00) = 0.5 hours
For Piece D: No intersection with the interval, so the duration is 0.

Total Duration of the Piece:

For Piece A: 2023-05-23 10:00:00 - 2023-05-23 08:00:00 = 2 hours
For Piece B: 2023-05-23 12:00:00 - 2023-05-23 09:30:00 = 2.5 hours
For Piece C: 2023-05-23 11:30:00 - 2023-05-23 10:30:00 = 1 hour
For Piece D: 2023-05-23 13:30:00 - 2023-05-23 12:00:00 = 1.5 hours

Fraction of Time in the Interval:

For Piece A: 1 hour / 2 hours = 0.5
For Piece B: 0.5 hours / 2.5 hours = 0.2
For Piece C: 0.5 hours / 1 hour = 0.5
For Piece D: 0 (since there's no intersection with the interval).

Number of PiezasReales in the Interval:

Sum of the fractions of time in the interval: 0.5 + 0.2 + 0.5 + 0 = 1.2
So, in this specific time interval, there were "RealPieces" equivalent to 1.2 pieces, taking into account the fraction of time each piece was in production within the interval.

2

Answers


  1. The first thing to do is to create a table timetable with a unique column as a PK to replace the generate series statement inside the query :

    CREATE TABLE TIME_SERIES AS
    SELECT *
    FROM generate_series(
                '2023-05-22 16:23:14'::timestamp,
                '2023-05-26 08:23:14'::timestamp,
                '8 hour'::interval
            ) AS d UNION 
    SELECT * 
    FROM generate_series(
                '2023-05-23 00:23:14'::timestamp,
                '2023-05-26 16:23:14'::timestamp,
                '8 hour'::interval
            ) AS f;
    

    The second thing to do is to replace your IN operators by a serie of temporary tables with an unique colonne that is a PK

    The last thing to do is to use INCLUDE indexes instead of classical ones to have covering indexes…

    Login or Signup to reply.
  2. All of your time (and indeed, more than all of your time, I don’t know how that happens) seem to be going to JIT compilation. And you probably get no benefit from it. Turn off JIT. I think it was a mistake for the developers to turn JIT on by default, as it seems to hurt more people than it helps. So just turn it off and be done with it.

    If it is still too slow, post a new plan. The effects of JIT tend to show up in random places that make no sense, making the current JIT-polluted plan hard to interpret.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search