skip to Main Content

I have a table called conversationLog. Conversationlog has messageId which belongs to message table. When i am running the query the cost is increasing because of sort mostly.
How can i reduce the sort time.

explain analyze SELECT
    *
FROM
    "conversationLog" AS l
    LEFT JOIN "message" AS "m" ON "m"."id" = l. "messageId"
WHERE
    l. "teamId" = 'uuid'
    AND l. "conversationId" = 'uuid'
    AND l. "type" IN( 'outgoing', 'incoming')
    AND l. "isDeleted" IS NOT TRUE
ORDER BY
    l. "createdAt" DESC
LIMIT 10 OFFSET 0;

Index for the query is also created

CREATE INDEX conversationLog_idx ON "conversationLog" ("teamId", "conversationId", "type", "isDeleted", "createdAt" DESC);

Explain Result

  Limit  (cost=6579.86..6579.88 rows=10 width=2399) (actual time=500.322..500.326 rows=10 loops=1)
  ->  Sort  (cost=6579.86..6581.15 rows=519 width=2399) (actual time=500.320..500.322 rows=10 loops=1)
"        Sort Key: l.""createdAt"" DESC"
        Sort Method: top-N heapsort  Memory: 34kB
        ->  Nested Loop Left Join  (cost=1.26..6568.64 rows=519 width=2399) (actual time=0.039..444.964 rows=84488 loops=1)
"              ->  Index Scan using conversationlog_idx on ""conversationLog"" l  (cost=0.69..2113.03 rows=519 width=835) (actual time=0.027..94.446 rows=84488 loops=1)"
"                    Index Cond: ((""teamId"" = 'uuid'::uuid) AND (""conversationId"" = 'uuid'::uuid) AND ((type)::text = ANY ('{outgoing,incoming}'::text[])))"
"                    Filter: (""isDeleted"" IS NOT TRUE)"
              ->  Index Scan using message_pkey on message m  (cost=0.57..8.59 rows=1 width=1556) (actual time=0.004..0.004 rows=1 loops=84488)
"                    Index Cond: (id = l.""messageId"")"
Planning Time: 0.337 ms
Execution Time: 500.390 ms

This result is when we have 84k records matching condition.

If i just remove the Order By the query cost come down to 1.26 and actual time 0.036.

Explain without order by

Limit  (cost=1.26..127.80 rows=10 width=2391) (actual time=0.036..0.104 rows=10 loops=1)
  ->  Nested Loop Left Join  (cost=1.26..6568.64 rows=519 width=2391) (actual time=0.035..0.102 rows=10 loops=1)
"        ->  Index Scan using conversationlog_idx on ""conversationLog"" l  (cost=0.69..2113.03 rows=519 width=835) (actual time=0.024..0.035 rows=10 loops=1)"
"              Index Cond: ((""teamBotId"" = 'uuid'::uuid) AND (""conversationId"" = 'uuid'::uuid) AND ((type)::text = ANY ('{outgoing,incoming}'::text[])))"
"              Filter: (""isDeleted"" IS NOT TRUE)"
        ->  Index Scan using message_pkey on message m  (cost=0.57..8.59 rows=1 width=1556) (actual time=0.006..0.006 rows=1 loops=10)
"              Index Cond: (id = l.""messageId"")"
Planning Time: 0.262 ms
Execution Time: 0.139 ms

I have also tried creating index like

CREATE INDEX index_c ON "conversationLog" ("createdAt" DESC NULLS LAST);

But that also was not picked by the query.

Why the index is not getting picked with order by?

How can we create the index so, that query picks it?

How can we reduce the sort cost and make this query faster?

2

Answers


  1. The index is not used for ORDER BY because the condition on type is not an equality comparison. You could try a partial index like

    CREATE INDEX ON "conversationLog" ("teamId", "conversationId", "createdAt")
       WHERE "isDeleted" IS NOT TRUE AND type IN ('outgoing', 'incoming'));
    

    It looks like each "conversationLog" can have at most one message, so you could write

    SELECT *
    FROM (SELECT *
          FROM "conversationLog"
          WHERE "teamId" = 'uuid'
            AND "conversationId" = 'uuid'
            AND type IN ('outgoing', 'incoming')
            AND "isDeleted" IS NOT TRUE
          ORDER BY "createdAt" DESC
          LIMIT 10) AS l
    LEFT JOIN "message" AS m ON m.id = l."messageId";
    
    Login or Signup to reply.
  2. Most of the time is not going to the sorting (only about 500.322 – 444.964 of it, or 10% of the total time). But a sort needs to read all of its input before it can return any rows, and this reading all of the input is where most of the time actually does go.

    It can’t use your first index to provide rows already in order, because the position of the "createdAt" column in the index is after two other columns which are not tested for simple equality, the IN list and the IS NOT TRUE. You could fix the IN NOT TRUE by rewriting it to IS FALSE, but this is not necessarily equivalent–they differ in how NULLs are handled. If "isDeleted" is declared as NOT NULL, then they would be equivalent, but the planner will not notice this fact so you would still need to rewrite the query.

    As Laurenz suggested, you could fix the problem with those two columns by creating a partial index, but that would only work when the in-list parameters matched the index condition. Another more flexible possibility would be to reorder the index columns to ("teamId", "conversationId", "createdAt", "type", "isDeleted") or maybe just leave those last two columns off entirely. This way you still get all of the selectivity of the first two columns (which I assume is where the vast majority of selectivity comes from) while avoiding the sort and thus the need to read all rows before sorting.

    The single column index on "createdAt" is not used (probably) because the planner thinks it will be slower. It could avoid the sort, but then the system would need to go through the rows with mismatching "teamId" and "conversationId" discarding them one by one. It apparently thinks that the benefit is not worth the cost. You could verify this by doing set enable_sort=false and seeing if then uses the index, what the estimated cost is, and what the true time is.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search