I have a big table with words (let’s call it words
). It consists of three columns, eg.:
id, word, sorted_word
1 , dog , dgo
2 , god , dgo
3 , cat , act
etc.
To find anagrams of o word I can do a simple search by the sorted_word
column: SELECT word FROM words WHERE sorted_word='dgo'
.
But to find anagrams with wildcard (think of having a blank tile in Scrabble), I would have to add some OR logic. For example to find all anagrams from letters D, O, G and any other letter, I would have to make that query:
SELECT word FROM words WHERE sorted_word LIKE '_dgo' OR sorted_word LIKE 'd_go' OR sorted_word LIKE 'dg_o' OR sorted_word LIKE'dgo_';
- Is it possible to create index in PostgreSQL for such queries?
- If SQL is not suitable for this kind of problem, what other database would be better?
2
Answers
For words of length 5 or more, you could use a trigram index:
Then you could query like this:
and the trigram index should speed up the search.
It appears that the middle pattern
df_oo
doesn’t contain a trigram, butpg_trgm
prepends two spaces to each word and appends one space, so there would be the trigramsd
,df
andoo
, which together should be selective enough.For shorter words an index won’t help directly, but you can use a partial index to reduce the search space, and there are not so many short words anyway.
Then search with
I made https://scrabblecheat.com years and years ago. When I did my lookup I used And Not instead of and. Made it easier and fast. I don’t know if this helps though.