Postgresql - why index only scan is too costly?

OscarRivas
August 31, 2023
282 views
0 votes
2 Answers

I have this for query for a table of a size of 2 700 000 rows in Heroku Postgres 12.16:

SELECT 1 AS one
FROM "users"
WHERE (
    external_id = 18
    AND (email = '[email protected]'
         OR uuid = '779c7963-67b2-43ea-b19b-028759a146dc'))
LIMIT 1

It takes 1497.968 ms to execute.

With EXPLAIN ANALYZE it returns the following:

"Limit  (cost=1000.11..41646.06 rows=1 width=4) (actual time=491.141..1497.948 rows=1 loops=1)"
"  ->  Gather  (cost=1000.11..82292.01 rows=2 width=4) (actual time=491.140..1497.946 rows=1 loops=1)"
"        Workers Planned: 1"
"        Workers Launched: 1"
"        ->  Parallel Index Only Scan using index_users_on_whitelabel_id_and_email_and_uuid on users  (cost=0.11..81291.81 rows=1 width=4) (actual time=953.762..953.762 rows=0 loops=2)"
"              Index Cond: (whitelabel_id = 18)"
"              Filter: (((email)::text = '[email protected]'::text) OR ((uuid)::text = '779c7963-67b2-43ea-b19b-028759a146dc'::text))"
"              Rows Removed by Filter: 1111474"
"              Heap Fetches: 3238"
"Planning Time: 0.139 ms"
"Execution Time: 1497.968 ms"

I see the total cost is too high (81291.81) and I suspect is the reason for the query run so slow.

Why is the reason of the total cost and how I can optimize it?

I already run VACCUUM ANALYZE users expecting the will improve the query, but it stayed the same.

Answers

- jjanes
- August 31, 2023 at 4:55 am
- 0 votes
0
The cost is an estimate. The high cost doesn’t cause the query to be slow, it merely reflects the expectation that the query will be slow. If the planner correctly estimates you will be filtering out so many rows, that is good reason to think it will be slow.

If you have separate indexes on both (external_id, email) and (external_id, uuid), then you should be able to get a very efficient BitmapOr operation. You apparently already have an index on (external_id, email, uuid), so the that first one is covered, you would only have to add the 2nd.

Login or Signup to reply.

- hermit
- August 31, 2023 at 6:10 am
- 0 votes
0
In addition to indexing, the prepared statement might help as well. You will create prepared statement once and query multiple times with different parameters.

So, a preapared statement something like this can be created.
```
PREPARE get_users(int,text,text) SELECT 1 AS one
FROM "users"
WHERE (
    external_id = $1
    AND (email = $2
         OR uuid = $3))
LIMIT 1;

EXECUTE get_users(18, '[email protected]', '779c7963-67b2-43ea-b19b-028759a146dc');
```
It prepares the query once and reuses the execution plan for each execution afterwards, which will help to improve the performance.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – why index only scan is too costly?

Answers