So I have this query which is extremely slow to run –
SELECT
"conversation".*,
"teamBotLocation"."id" AS "teamBotLocation.id",
"teamBotLocation"."name" AS "teamBotLocation.name"
FROM
"conversation" AS "conversation"
LEFT OUTER JOIN "teamBotLocation" AS "teamBotLocation" ON "conversation"."teamBotLocationId" = "teamBotLocation"."id"
WHERE
"conversation"."teamBotId" = 'some_id'
AND "conversation"."teamBotLocationId" IN ('some_id','some_id')
AND "conversation"."isDeleted" = false
AND "conversation"."isBlocked" = false
AND "conversation"."isBroadcast" = false
AND "conversation"."isGroup" = false
ORDER BY
CASE
WHEN "firstName" = '' THEN NULL
ELSE 0
END ASC NULLS LAST,
"conversation"."firstName" NULLS LAST,"conversation"."id" DESC
LIMIT
10 OFFSET 100;
What this query actually does ?
So I have a conversation table that has almost 20 million rows. And I am displaying all these conversations in a table with a default limit of 10. We are sorting the table by these 5 columns firstName
, lastName
, phone
, email
and createdAt
in ascending/descending order.
I am currently using this query in production. For smaller no. of results the execution time is below 1s which is fine. But when results are in millions let’s say 1 million then the execution time is too slow well around 10s. I found out that the primary reason for the slowness is sorting.
This is my query plan in text –
Limit (cost=353265.17..353265.20 rows=10 width=1742) (actual time=130013.935..130013.939 rows=10 loops=1)
-> Sort (cost=353265.17..353481.27 rows=86438 width=1742) (actual time=130013.934..130013.937 rows=10 loops=1)
" Sort Key: (CASE WHEN ((conversation.""firstName"")::text = ''::text) THEN NULL::integer ELSE 0 END), conversation.""firstName"", conversation.id DESC"
Sort Method: top-N heapsort Memory: 37kB
-> Hash Left Join (cost=454.80..351397.28 rows=86438 width=1742) (actual time=12.856..128302.754 rows=1387007 loops=1)
" Hash Cond: (conversation.""teamBotLocationId"" = ""teamBotLocation"".id)"
" -> Index Scan using ""conversation_teambot_teambotLocation_email"" on conversation (cost=0.56..350499.44 rows=86438 width=1706) (actual time=1.165..127696.780 rows=1387007 loops=1)"
" Index Cond: ((""teamBotId"" = 'f92eb170-28b2-11ed-a128-098b9f10e3d6'::uuid) AND (""teamBotLocationId"" = ANY ('{f9978ff0-28b3-11ed-a128-098b9f10e3d6,f1e505b0-f033-11ed-b347-43d7d9fb1309}'::uuid[])))"
" Filter: ((NOT ""isDeleted"") AND (NOT ""isBlocked"") AND (NOT ""isBroadcast"") AND (NOT ""isGroup""))"
Rows Removed by Filter: 6940
-> Hash (cost=434.66..434.66 rows=1566 width=32) (actual time=11.677..11.678 rows=1566 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 115kB
" -> Seq Scan on ""teamBotLocation"" (cost=0.00..434.66 rows=1566 width=32) (actual time=0.004..11.326 rows=1566 loops=1)"
Planning Time: 35.488 ms
Execution Time: 130014.015 ms
I need a solution that executes the query in under 1-2s keeping in mind that the sorting can be done in either ascending or descending order for these 5 columns firstName
, lastName
, phone
, email
and createdAt
.
**Note **-
- If any row has an empty string in a particular column, it should keep that row at the last while sorting.
- To keep in mind, all these conversations belong to different accounts in a 1-to-1 mapping. So technically this query is for a particular account. Also each account can have different locations. So the conversations also has a 1-to-1 mapping with these locations.
2
Answers
You can sort firstly the firstName column with limit 10 and just after that to sort the result table with the four other columns
sorting 10 rows is much more quickly
My answer assumes that
"firstName"
is a column ofconversation
. Otherwise, there is little hope for your query.You can simplify the
ORDER BY
clause tobecause
FALSE < TRUE
.Then you need an index to support the
ORDER BY
clause (and, in part, theWHERE
condition):and an index for the nested loop join (on
"teamBotLocation".id
, but if that is the primary key, you already have an index there).If you need to support different
ORDER BY
clauses, you need an index for each of these. Yes, that is expensive and will slow down data modifications. But if you need these queries to be fast, that’s the price you have to pay.