I have a database with over 30,000,000 entries. When performing queries (including an ORDER BY
clause) on a text
field, the =
operator results in relatively fast results. However we have noticed that when using the LIKE
operator, the query becomes remarkably slow, taking minutes to complete. For example:
SELECT * FROM work_item_summary WHERE manager LIKE '%manager' ORDER BY created;
Creating indices on the keywords being searched will of course greatly speed up the query. The problem is we must support queries on any arbitrary pattern, and on any column, making this solution not viable.
My questions are:
- Why are
LIKE
queries this much slower than=
queries? - Is there any other way these generic queries can be optimized, or is about as good as one can get for a database with so many entries?
2
Answers
Your query plan shows a sequential scan, which is slow for big tables, and also not surprising since your
LIKE
pattern has a leading wildcard that cannot be supported with a plain B-tree index.You need to add index support. Either a trigram GIN index to support any and all patterns, or a
COLLATE "C"
B-tree expression index on the reversed string to specifically target leading wildcards.See:
One technic to speed up queries that search partial word (eg ‘%something%’) si to use rotational indexing technic wich is not implementedin most of RDBMS.
This technique consists of cutting out each word of a sentence and then cutting it in "rotation", ie creating a list of parts of this word from which the first letter is successively removed.
As an example the word "electricity" will be exploded into 10 words :
lectricity
ectricity
ctricity
tricity
ricity
icity
city
ity
ty
y
Then you put all those words into a dictionnary that is a simple table with an index… and reference the root word.
Tables for this are :