I want to optimize the following DB SELECT query:
SELECT
ls.caller_id,
ls.caller_path_id,
ls.caller_path_name,
ls.caller_data->>'worker' AS worker,
ls.caller_stage,
ls.current_status,
ls.date_inserted AS date_closed,
ls.caller_data->>'name' AS firstname,
ls.caller_data->>'surname' AS surname,
ls.caller_data->>'cell' AS cell,
ls.caller_data->>'home' AS home,
ls.caller_data->>'work' AS work,
ls.caller_data->>'other' AS other
FROM mv.caller_stage ls
WHERE ls.caller_stage = 'ACTIVE'
AND (REPLACE(ls.caller_data->>'caller_expiry_date', '/', '-' ))::DATE > NOW()::DATE
AND ls.active
AND caller_data @> jsonb_build_object('worker', 'Alice')
AND ls.date_inserted >= (NOW()- CONCAT('3 days')::INTERVAL)
AND ((ls.caller_path_name LIKE 'Mode %' AND ls.current_status IN (
'Received',
'Wrong Number',
'Does Not Know ID',
'No Such Person','Unreachable',
'Under 21',
'No Permission',
'Do Not Contact',
'Not Interested',
'Does Not Understand',
'Query',
'Engaged',
'No Address',
'Wrong ID'))
OR (ls.caller_path_name LIKE 'Method %' AND ls.current_status IN (
'Successfully Captured'
)))
AND ( _search_text IS NULL
OR(
_search_text IS NOT NULL AND(
ls.caller_data->>'surname' ILIKE CONCAT('%',_search_text,'%')
OR ls.caller_data->>'name' ILIKE CONCAT('%',_search_text,'%')
OR ls.caller_data->>'cell' ILIKE CONCAT('%',_search_text,'%')
OR ls.caller_data->>'home' ILIKE CONCAT('%',_search_text,'%')
OR ls.caller_data->>'work' ILIKE CONCAT('%',_search_text,'%')
OR ls.caller_data->>'other' ILIKE CONCAT('%',_search_text,'%')
)
))
Runtime is still too high so I was asked to further optimize it.
I tried adding relevant indexes, and I was able to get a 3x improvement.
Would like to know if I can further improve the SELECT
query performance via modifying any of the clauses, such as the IN
or ILIKE
clauses?
2
Answers
Consider building partial indexes for particular data subsets. ls.current_status IN (…) and ls.caller_path_name LIKE ‘Mode%’ are two examples of conditions you might use. For each of these circumstances independently, you may make partial indices, which might enhance performance.
Hope it works 🙂
You have sacrificed performance for a simpler data model by using JSON in your table. That is okay. You must live with the fact that accessing data from a table where you want to filter by JSON content is rather slow.
But there are some columns that you use in your
WHERE
clause that are not JSON. Let’s look at these and decide which is the most restrictive condition.Does
ls.caller_stage = 'ACTIVE'
limit the number of rows immensely? Doesls.active
? Doescaller_path_name LIKE 'Mode %'
? Doesls.date_inserted >= (NOW()- CONCAT('3 days')::INTERVAL)
? The latter is a strange expression by the way. You concat the string ‘3 days’? With what? You make this an interval? Why notINTERVAL '3 days'
right away? You subtract this fromNOW
, thus getting a timestamp that you compare with a date. (At least the namedate_inserted
suggests a date.) Is this intended?I don’t consider
current_status IN (...)
by the way, becauseIN
is like manyOR
and indexes usually don’t help much with that.Let’s suppose that the condition on the date is the most restrictive, so I put this first in the index. Then let’s just say
caller_stage
is the second restrictive, caller_path_name the third andactive
the least one. So we build this index:This index is likely to help the DBMS find the data quickly. However, it may decide not to use it. That would be the case for instance, if there were only four days of data in the table, of which you want three.
Disclaimer: I am not a PostgreSQL developer. Postgres offers a lot that many other DBMS don’t. It is probable that they offer ways to index JSON data, and this may or may not help here.