skip to Main Content

I’m using postgre full text search to search among a text built out of tags, address, name etc…
I’m doing some tests to see how it behave, but I cannot really understand it.
What I tried:

The following queries returns true

select to_tsvector('english', 'animals') @@ to_tsquery('english', 'anim:*')
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'animal:*')
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'animals:*')

While these ones not

select to_tsvector('english', 'animals') @@ to_tsquery('english', 'a:*')
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'an:*')
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'ani:*')
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'anima:*')

I guess the first 3 return false because it see the input as stop words?
But for the last query, I have no idea why is returning false.

Any inputs?

Thank you

3

Answers


  1. tsvectors work with lexemes. This means words are brought back to there most basic form. This means that words as animals is brought back to anim. This is not done on logic rules, but based on dictionairy. In this case the english dictionairy. it contains animals –> anim. It also contains anima –> anima (since anima is an english word with a diffrent lexemes), and ani –> ani (same story.

    So when you are trying to make your compares, you are realy looking if certain terms are present in the list of lexemes of your text. Since your text is only one word (animals) it is only a list of one lexemes (anim).

    The 7 values you compare with it are:

    • anim (lexemes anim) –> true (is in list of lexemes)

    • animal (lexemes anim) –> true (is in list of lexemes)

    • animals (lexemes anim) –> true (is in list of lexemes)

    • a (lexems non existing because of stopword) –> false

    • an (lexems non existing because of stopword) –> false

    • ani (lexemes ani) –> false (is not in list of lexemes)

    • anima (lexemes anima) –> false (is not in list of lexemes)

    Hope this explains it

    good luck with your project

    (see also https://www.postgresql.org/docs/current/textsearch-controls.html )

    Login or Signup to reply.
  2. Here’s postgres documentation that explains full text search with very helpful examples
    Docs link

    Login or Signup to reply.
  3. The stemmer truncates "animal" like this:

    select to_tsvector('english', 'animals');
    
     to_tsvector 
    ═════════════
     'anim':1
    (1 row)
    

    That explains why a search for anima:* fails (one a too many).

    Searching for ani:* works just fine, different from what you say.

    Searching for a:* and an:* finds nothing, since these words are stop words in English and get eliminated.

    What you see is that Full Text Search is not ideal for prefix search.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search