skip to Main Content

I have a PostgreSQL table with a bunch of events, each containing a timestamp, id, and action. I expect a few million events per day, and expect the table to grow to several billion rows eventually.

I want to query the most recent events for each item (a few hundred thousand distinct ids) before a given date, but currently the query is extremely slow, taking 1-2 hours minimum (the table currently has around 100 million rows). Is there a way to speed up this query?

SELECT a.* FROM (
    with events as (
        SELECT
          ROW_NUMBER() OVER (PARTITION BY item ORDER BY time_stamp DESC) AS rn,
          *
        FROM event_updates
        WHERE time_stamp < '2023-05-01'
    )
    SELECT * FROM events WHERE rn=1 ORDER BY item
) a;

DDL for table

CREATE TABLE "event_updates" (
  "id" int4 NOT NULL DEFAULT nextval("event_updates"::regclass),
  "time_stamp" timestamptz(6) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "item" varchar(32) COLLATE "pg_catalog"."default" NOT NULL,
  "event_type" int2,
)
PARTITION BY ();
ALTER TABLE "event_updates" OWNER TO "owners";

2

Answers


  1. First of all, you need an index on item and the timestamp. Second, you select the max(timestamp) for each item and then select the record you need:

    CREATE INDEX idx_event_updates_item_time_stamp
        ON event_updates(item, time_stamp);
    
    SELECT *
    FROM event_updates
        JOIN (SELECT item
                   , MAX(time_stamp) time_stamp
              FROM event_updates
              GROUP BY item
              ) sub USING (item, time_stamp);
    

    You might want to create the index per partition, concurrently, to avoid locking issues. It takes longer however.

    Login or Signup to reply.
  2. Emulate an index skip scan fro best performance:

    WITH RECURSIVE cte AS (
       (
       SELECT *
       FROM   event_updates
       ORDER  BY item, time_stamp DESC
       LIMIT  1
       )
       UNION ALL
       SELECT e.*
       FROM   cte c
       CROSS  JOIN LATERAL (
          SELECT e.*
          FROM   event_updates e
          WHERE  e.item > c.item
          ORDER  BY item, time_stamp DESC
          LIMIT  1
          ) e
       )
    TABLE  cte;
    

    In combination with a matching index (essential):

    CREATE INDEX event_updates_item_time_stamp_idx ON event_updates(item, time_stamp DESC);
    

    See:

    More optimization may be possible.
    I’ll explain some more after you have disclosed requested information.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search