skip to Main Content

I have a big table with words (let’s call it words). It consists of three columns, eg.:

id, word, sorted_word
1 , dog , dgo
2 , god , dgo
3 , cat , act
etc.

To find anagrams of o word I can do a simple search by the sorted_word column: SELECT word FROM words WHERE sorted_word='dgo'.
But to find anagrams with wildcard (think of having a blank tile in Scrabble), I would have to add some OR logic. For example to find all anagrams from letters D, O, G and any other letter, I would have to make that query:

SELECT word FROM words WHERE sorted_word LIKE '_dgo' OR sorted_word LIKE 'd_go' OR sorted_word LIKE 'dg_o' OR sorted_word LIKE'dgo_';  
  1. Is it possible to create index in PostgreSQL for such queries?
  2. If SQL is not suitable for this kind of problem, what other database would be better?

2

Answers


  1. For words of length 5 or more, you could use a trigram index:

    CREATE EXTENSION IF NOT EXISTS pg_trgm;
    
    CREATE INDEX ON words USING gin (sorted_word gin_trgm_ops)
       WHERE length(sorted_word) >= 5;
    

    Then you could query like this:

    SELECT word FROM words
    WHERE sorted_word LIKE '_floo' AND length(sorted_word) >= 5
       OR sorted_word LIKE 'd_loo' AND length(sorted_word) >= 5
       OR sorted_word LIKE 'df_oo' AND length(sorted_word) >= 5
       OR sorted_word LIKE 'dfl_o' AND length(sorted_word) >= 5
       OR sorted_word LIKE 'dflo_' AND length(sorted_word) >= 5;
    

    and the trigram index should speed up the search.

    It appears that the middle pattern df_oo doesn’t contain a trigram, but pg_trgm prepends two spaces to each word and appends one space, so there would be the trigrams d, df and oo , which together should be selective enough.

    For shorter words an index won’t help directly, but you can use a partial index to reduce the search space, and there are not so many short words anyway.

    CREATE INDEX ON words (sorted_word) WHERE length(sorted_word) < 5;
    

    Then search with

    SELECT word FROM words
    WHERE sorted_word LIKE '_foo' AND length(sorted_word) < 5
       OR sorted_word LIKE 'd_oo' AND length(sorted_word) < 5
       OR sorted_word LIKE 'df_o' AND length(sorted_word) < 5
       OR sorted_word LIKE 'dfo_' AND length(sorted_word) < 5;
    
    Login or Signup to reply.
  2. I made https://scrabblecheat.com years and years ago. When I did my lookup I used And Not instead of and. Made it easier and fast. I don’t know if this helps though.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search