Postgresql - Query is extremely slow selecting data for last x events for racer

WebDevz
August 13, 2024
185 views
1 vote
2 Answers

I have a table race_racer that contains time serialized logs for each racer in a race. I am trying to select the lastest record for a given racer in x number of races prior to a date. Basically selecting their finalized stats. The table has approximately 64mm records. I’m using a framework that doesn’t support window functions and DB functions are not an option.

I cannot select based on a list of race_ids because a racer might not be in a given race.

Postgres 15

Table DLL

CREATE TABLE race_racer (
    race_racer_id SERIAL PRIMARY KEY,
    log_id integer NOT NULL,
    racer_id integer NOT NULL,
    race_id integer NOT NULL,
    stats jsonb,
    created_at timestamp without time zone NOT NULL DEFAULT now()
);

-- Indices -------------------------------------------------------

CREATE UNIQUE INDEX race_racer_pkey ON race_racer(race_racer_id int4_ops);
CREATE INDEX race_racer_log_id_idx ON race_racer(log_id int4_ops);
CREATE INDEX race_racer_racer_id_idx ON race_racer(racer_id int4_ops);
CREATE INDEX race_racer_race_id_idx ON race_racer(race_id int4_ops);
CREATE INDEX race_racer_created_at_idx ON race_racer(created_at timestamp_ops);

race_racer_id is an auto increment and log_id the record id from our vendor. It’s possible for duplicate log_id records to exist as the data is populated in batches.

Query

SELECT MAX(p2.race_racer_id) race_racer_id
FROM race_racer p2
WHERE p2.racer_id = 10002093
AND p2.created_at < '2024-08-01'
GROUP BY p2.log_id, p2.race_id
ORDER BY p2.log_id DESC
LIMIT 25

EXPLAIN(ANALYZE,BUFFERS,SETTINGS,VERBOSE)

Limit  (cost=115.70..2445.14 rows=25 width=12) (actual time=150899.392..369576.667 rows=25 loops=1)
  Output: (max(race_racer_id)), log_id, race_id
  Buffers: shared hit=3984768 read=553269 written=66
  ->  GroupAggregate  (cost=115.70..9417387.87 rows=101068 width=12) (actual time=150899.390..369576.638 rows=25 loops=1)
        Output: max(race_racer_id), log_id, race_id
        Group Key: p2.log_id, p2.race_id
        Buffers: shared hit=3984768 read=553269 written=66
        ->  Incremental Sort  (cost=115.70..9415611.55 rows=102085 width=12) (actual time=147171.318..369575.442 rows=7578 loops=1)
              Output: log_id, race_id, race_racer_id
              Sort Key: p2.log_id DESC, p2.race_id
              Presorted Key: p2.log_id
              Full-sort Groups: 26  Sort Method: quicksort  Average Memory: 28kB  Peak Memory: 28kB
              Pre-sorted Groups: 26  Sort Method: quicksort  Average Memory: 41kB  Peak Memory: 41kB
              Buffers: shared hit=3984768 read=553269 written=66
              ->  Index Scan Backward using race_racer_log_id_idx on public.race_racer p2  (cost=0.44..9411732.46 rows=102085 width=12) (actual time=142302.220..369572.386 rows=7865 loops=1)
                    Output: log_id, race_id, race_racer_id
                    Filter: ((p2.created_at < '2024-08-01 00:00:00'::timestamp without time zone) AND (p2.racer_id = 10002093))
                    Rows Removed by Filter: 4547156
                    Buffers: shared hit=3984765 read=553269 written=66
Settings: effective_cache_size = '19256MB', effective_io_concurrency = '300', maintenance_io_concurrency = '300', random_page_cost = '1', work_mem = '25671kB'
Query Identifier: 7029425313897609646
Planning:
  Buffers: shared hit=182 read=10
Planning Time: 11.919 ms
Execution Time: 369579.279 ms

The query is executing in about 45 seconds which is incredibly slow. Please help! If I select the latest race data for a specific racer using race_id, it resolves in under 20ms.

I’ve tried making indexes and changing the query around to no avail.

Answers

- FrankHeikens
- August 12, 2024 at 5:48 pm
- 0 votes
0
This query could benefit from an index like this:
```
CREATE INDEX idx_cover_race_racer ON race_racer(
  racer_id, created_at, log_id, race_id, race_racer_id);
```
You might have to change/optimize the order of the columns in the index.
Login or Signup to reply.

- jjanes
- August 12, 2024 at 6:04 pm
- 0 votes
0
Try an index on (racer_id, log_id).

That way it doesn’t need to decide whether to get the selectivity of racer_id or the ordering of log_id. It can get both at the same time.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Query is extremely slow selecting data for last x events for racer

Answers