I have this for query for a table of a size of 2 700 000 rows in Heroku Postgres 12.16:
SELECT 1 AS one
FROM "users"
WHERE (
external_id = 18
AND (email = '[email protected]'
OR uuid = '779c7963-67b2-43ea-b19b-028759a146dc'))
LIMIT 1
It takes 1497.968 ms to execute.
With EXPLAIN ANALYZE
it returns the following:
"Limit (cost=1000.11..41646.06 rows=1 width=4) (actual time=491.141..1497.948 rows=1 loops=1)"
" -> Gather (cost=1000.11..82292.01 rows=2 width=4) (actual time=491.140..1497.946 rows=1 loops=1)"
" Workers Planned: 1"
" Workers Launched: 1"
" -> Parallel Index Only Scan using index_users_on_whitelabel_id_and_email_and_uuid on users (cost=0.11..81291.81 rows=1 width=4) (actual time=953.762..953.762 rows=0 loops=2)"
" Index Cond: (whitelabel_id = 18)"
" Filter: (((email)::text = '[email protected]'::text) OR ((uuid)::text = '779c7963-67b2-43ea-b19b-028759a146dc'::text))"
" Rows Removed by Filter: 1111474"
" Heap Fetches: 3238"
"Planning Time: 0.139 ms"
"Execution Time: 1497.968 ms"
I see the total cost is too high (81291.81) and I suspect is the reason for the query run so slow.
Why is the reason of the total cost and how I can optimize it?
I already run VACCUUM ANALYZE users
expecting the will improve the query, but it stayed the same.
2
Answers
The cost is an estimate. The high cost doesn’t cause the query to be slow, it merely reflects the expectation that the query will be slow. If the planner correctly estimates you will be filtering out so many rows, that is good reason to think it will be slow.
If you have separate indexes on both
(external_id, email)
and(external_id, uuid)
, then you should be able to get a very efficient BitmapOr operation. You apparently already have an index on(external_id, email, uuid)
, so the that first one is covered, you would only have to add the 2nd.In addition to indexing, the prepared statement might help as well. You will create prepared statement once and query multiple times with different parameters.
So, a preapared statement something like this can be created.
It prepares the query once and reuses the execution plan for each execution afterwards, which will help to improve the performance.