I’m working on an express.js project, using Sequelize as the orm, and postgres as the engine.
I’m working on improving the performance of a slow select query that does many things, and one of them is comparing the acronym of the name of the warehouse against a search term using ilike, so if the search term is "STW", a warehouse named "Super Test Warehouse" will match. The query looks like this:
SELECT field1, field2, ...
FROM package p
LEFT OUTER JOIN warehouse w ON p.warehouse_id = w.id
-- many other joins
WHERE
regexp_replace(w.name, '([a-z ])+', '', 'g') ILIKE '%search term%'
-- many other conditions
The thing is, the main improvement I’m doing is creating indexes (gin for everything that does ilike '%something%'
), and refactoring the query to be able to use them. But the regexp_replace()
function doesn’t work with indexes. Another important thing to note is that this solution only works if all the words in the names are correctly capitalized, which is the case with most of our data, but not all, so if there’s a warehouse named "Super TEST Warehouse", or "super test Warehouse", it won’t work correctly.
A simple solution I thought of, is to do this (assuming the search text is "STW"):
w.name ILIKE 'S% T% W%' OR w.name ILIKE '% S% T% W%'
It works on the average case, and it uses the gin index I created. But the problem is that it allows other words in between. So "STW" would also match "Super Fake Test Warehouse".
Is there any condition that would match the acronym correctly, and would use an index?
Would you recommend using a generated column for this (assuming I improve the existing regexp_replace, or use something else)? Are there any downsides of doing that? Ideally, I don’t want to alter a table only for this feature, but could be a good solution if there’s nothing else we can do.
Any other solutions?
2
Answers
What I'm doing in the end is making a generated column like this:
I also have this index for name:
First, this solution is more accurate for the acronyms, since it actually takes the first letter of each word. And also, ilike queries against the generated column will make use the index.
So create an expression index:
The expression extracts upper-case acronyms reliably (except for trailing non-word characters in your strings):
The involved functions are immutable, so fit for an expression index.
Repeat the expression in your queries to bring in the index:
I would prefer to add a generated column, though. Simpler query and a bit faster, yet, for only very few added bytes per row. See: