PostgreSQL fast query to get most recent rows

JohnJames
August 11, 2023
271 views
0 votes
2 Answers

I have a PostgreSQL table with a bunch of events, each containing a timestamp, id, and action. I expect a few million events per day, and expect the table to grow to several billion rows eventually.

I want to query the most recent events for each item (a few hundred thousand distinct ids) before a given date, but currently the query is extremely slow, taking 1-2 hours minimum (the table currently has around 100 million rows). Is there a way to speed up this query?

SELECT a.* FROM (
    with events as (
        SELECT
          ROW_NUMBER() OVER (PARTITION BY item ORDER BY time_stamp DESC) AS rn,
          *
        FROM event_updates
        WHERE time_stamp < '2023-05-01'
    )
    SELECT * FROM events WHERE rn=1 ORDER BY item
) a;

DDL for table

CREATE TABLE "event_updates" (
  "id" int4 NOT NULL DEFAULT nextval("event_updates"::regclass),
  "time_stamp" timestamptz(6) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "item" varchar(32) COLLATE "pg_catalog"."default" NOT NULL,
  "event_type" int2,
)
PARTITION BY ();
ALTER TABLE "event_updates" OWNER TO "owners";

Tags: greatest-n-per-group postgresql

Answers

- FrankHeikens
- August 11, 2023 at 12:39 am
- 0 votes
0
First of all, you need an index on item and the timestamp. Second, you select the max(timestamp) for each item and then select the record you need:
```
CREATE INDEX idx_event_updates_item_time_stamp
    ON event_updates(item, time_stamp);

SELECT *
FROM event_updates
    JOIN (SELECT item
               , MAX(time_stamp) time_stamp
          FROM event_updates
          GROUP BY item
          ) sub USING (item, time_stamp);
```
You might want to create the index per partition, concurrently, to avoid locking issues. It takes longer however.
Login or Signup to reply.

- ErwinBrandstetter
- August 11, 2023 at 12:53 am
- 0 votes
0
Emulate an index skip scan fro best performance:
```
WITH RECURSIVE cte AS (
   (
   SELECT *
   FROM   event_updates
   ORDER  BY item, time_stamp DESC
   LIMIT  1
   )
   UNION ALL
   SELECT e.*
   FROM   cte c
   CROSS  JOIN LATERAL (
      SELECT e.*
      FROM   event_updates e
      WHERE  e.item > c.item
      ORDER  BY item, time_stamp DESC
      LIMIT  1
      ) e
   )
TABLE  cte;
```
In combination with a matching index (essential):
```
CREATE INDEX event_updates_item_time_stamp_idx ON event_updates(item, time_stamp DESC);
```
See:
- Optimize GROUP BY query to retrieve latest row per user
- SELECT DISTINCT is slower than expected on my table in PostgreSQL
More optimization may be possible.
I’ll explain some more after you have disclosed requested information.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.