Postgresql - Postgres Index to speed up LEFT OUTER JOIN

user5198729
September 29, 2022
261 views
0 votes
2 Answers

Within my db I have table prediction_fsd with about 5 million entries. The site table contains approx 3 million entries. I need to execute queries that look like

SELECT prediction_fsd.id AS prediction_fsd_id, 
       prediction_fsd.site_id AS prediction_fsd_site_id, 
       prediction_fsd.html_hash AS prediction_fsd_html_hash, 
       prediction_fsd.prediction AS prediction_fsd_prediction, 
       prediction_fsd.algorithm AS prediction_fsd_algorithm, 
       prediction_fsd.model_version AS prediction_fsd_model_version,
       prediction_fsd.timestamp AS prediction_fsd_timestamp, 
       site_1.id AS site_1_id, 
       site_1.url AS site_1_url, 
       site_1.status AS site_1_status 
  FROM prediction_fsd
  LEFT OUTER JOIN site AS site_1
         ON site_1.id = prediction_fsd.site_id 
 WHERE 95806 = prediction_fsd.site_id
   AND prediction_fsd.algorithm = 'xgboost'
 ORDER BY prediction_fsd.timestamp DESC 
 LIMIT 1

at the moment this query takes about ~4 seconds. I’d like to reduce that by introducing an index. Which tables and fields should I include in that index. I’m having troubles properly understanding the EXPLAIN ANALYZE output of Postgres

Answers

Chosen as BEST ANSWER
- user5198729
- September 29, 2022 at 2:15 pm
- 0 votes
0
```
CREATE INDEX prediction_fsd_site_id_algorithm_timestamp
    ON public.prediction_fsd USING btree
    (site_id, algorithm, "timestamp" DESC)
    TABLESPACE pg_default;
```
By introducing a combined index as suggested by Frank Heikens I was able to bring down the query execution time to 0.25s

(Edit)

- OJones
- September 29, 2022 at 2:19 pm
- 0 votes
0
These three SQL lines point to a possible BTREE index to help you.
```
WHERE 95806 = prediction_fsd.site_id
   AND prediction_fsd.algorithm = 'xgboost'
 ORDER BY prediction_fsd.timestamp DESC 
```
You’re filtering the rows of the table by equality on two columns, and ordering by the third column. So try this index.
```
CREATE INDEX site_alg_ts ON prediction_fsd 
      (site_id, algorithm, timestamp DESC);
```
This BTREE index lets PostgreSQL random-access it to the first eligible row, which happens also to be the row you want with your ORDER BY ... LIMIT 1 clause.

The query plan in your question says that PostgreSQL did an expensive Parallel Sequential Scan on all five megarows of that table. This index will almost certainly change that to a cheap index lookup.

On the other table, it appears that you already look up rows in it via the primary key id. So you don’t need any other index for that one.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Postgres Index to speed up LEFT OUTER JOIN

Answers