skip to Main Content

I want to optimize the following DB SELECT query:

SELECT
            ls.caller_id,
            ls.caller_path_id,
            ls.caller_path_name,
            ls.caller_data->>'worker' AS worker,
            ls.caller_stage,
            ls.current_status,
            ls.date_inserted AS date_closed,
            ls.caller_data->>'name' AS firstname,
            ls.caller_data->>'surname' AS surname,
            ls.caller_data->>'cell' AS cell,
            ls.caller_data->>'home' AS home,
            ls.caller_data->>'work' AS work,
            ls.caller_data->>'other' AS other
        FROM mv.caller_stage ls
        WHERE  ls.caller_stage = 'ACTIVE'
        AND (REPLACE(ls.caller_data->>'caller_expiry_date', '/', '-' ))::DATE > NOW()::DATE
        AND ls.active
        AND caller_data @> jsonb_build_object('worker', 'Alice')
        AND ls.date_inserted >= (NOW()- CONCAT('3 days')::INTERVAL)
        AND ((ls.caller_path_name LIKE 'Mode %' AND  ls.current_status IN (
            'Received',
            'Wrong Number',
            'Does Not Know ID',
            'No Such Person','Unreachable',
            'Under 21',
            'No Permission',
            'Do Not Contact',
            'Not Interested',
            'Does Not Understand',
            'Query',
            'Engaged',
            'No Address',
            'Wrong ID'))

            OR (ls.caller_path_name LIKE 'Method %' AND  ls.current_status IN (
            'Successfully Captured'
            )))

        AND ( _search_text IS  NULL
            OR(
                _search_text IS NOT NULL AND(
                    ls.caller_data->>'surname' ILIKE CONCAT('%',_search_text,'%')
                    OR ls.caller_data->>'name' ILIKE CONCAT('%',_search_text,'%')
                    OR ls.caller_data->>'cell' ILIKE CONCAT('%',_search_text,'%')
                    OR ls.caller_data->>'home' ILIKE CONCAT('%',_search_text,'%')
                    OR ls.caller_data->>'work' ILIKE CONCAT('%',_search_text,'%')
                    OR ls.caller_data->>'other' ILIKE CONCAT('%',_search_text,'%')
                )

        ))

Runtime is still too high so I was asked to further optimize it.

I tried adding relevant indexes, and I was able to get a 3x improvement.

Would like to know if I can further improve the SELECT query performance via modifying any of the clauses, such as the IN or ILIKE clauses?

2

Answers


  1. Consider building partial indexes for particular data subsets. ls.current_status IN (…) and ls.caller_path_name LIKE ‘Mode%’ are two examples of conditions you might use. For each of these circumstances independently, you may make partial indices, which might enhance performance.

    CREATE INDEX idx_active_mode_status ON mv.caller_stage (caller_path_name, current_status) WHERE caller_stage = 'ACTIVE' AND ...;
    CREATE INDEX idx_active_method_status ON mv.caller_stage (caller_path_name, current_status) WHERE caller_stage = 'ACTIVE' AND ...;
    

    Hope it works 🙂

    Login or Signup to reply.
  2. You have sacrificed performance for a simpler data model by using JSON in your table. That is okay. You must live with the fact that accessing data from a table where you want to filter by JSON content is rather slow.

    But there are some columns that you use in your WHERE clause that are not JSON. Let’s look at these and decide which is the most restrictive condition.

    Does ls.caller_stage = 'ACTIVE' limit the number of rows immensely? Does ls.active? Does caller_path_name LIKE 'Mode %'? Does ls.date_inserted >= (NOW()- CONCAT('3 days')::INTERVAL)? The latter is a strange expression by the way. You concat the string ‘3 days’? With what? You make this an interval? Why not INTERVAL '3 days' right away? You subtract this from NOW, thus getting a timestamp that you compare with a date. (At least the name date_inserted suggests a date.) Is this intended?

    I don’t consider current_status IN (...) by the way, because INis like many OR and indexes usually don’t help much with that.

    Let’s suppose that the condition on the date is the most restrictive, so I put this first in the index. Then let’s just say caller_stage is the second restrictive, caller_path_name the third and active the least one. So we build this index:

    create index idx on mv.caller_stage (date_inserted, caller_stage, caller_path_name, active);
    

    This index is likely to help the DBMS find the data quickly. However, it may decide not to use it. That would be the case for instance, if there were only four days of data in the table, of which you want three.

    Disclaimer: I am not a PostgreSQL developer. Postgres offers a lot that many other DBMS don’t. It is probable that they offer ways to index JSON data, and this may or may not help here.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search