I’m using postgre full text search to search among a text built out of tags, address, name etc…
I’m doing some tests to see how it behave, but I cannot really understand it.
What I tried:
The following queries returns true
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'anim:*')
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'animal:*')
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'animals:*')
While these ones not
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'a:*')
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'an:*')
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'ani:*')
select to_tsvector('english', 'animals') @@ to_tsquery('english', 'anima:*')
I guess the first 3 return false because it see the input as stop words?
But for the last query, I have no idea why is returning false.
Any inputs?
Thank you
3
Answers
tsvectors work with lexemes. This means words are brought back to there most basic form. This means that words as animals is brought back to anim. This is not done on logic rules, but based on dictionairy. In this case the english dictionairy. it contains animals –> anim. It also contains anima –> anima (since anima is an english word with a diffrent lexemes), and ani –> ani (same story.
So when you are trying to make your compares, you are realy looking if certain terms are present in the list of lexemes of your text. Since your text is only one word (animals) it is only a list of one lexemes (anim).
The 7 values you compare with it are:
anim (lexemes anim) –> true (is in list of lexemes)
animal (lexemes anim) –> true (is in list of lexemes)
animals (lexemes anim) –> true (is in list of lexemes)
a (lexems non existing because of stopword) –> false
an (lexems non existing because of stopword) –> false
ani (lexemes ani) –> false (is not in list of lexemes)
anima (lexemes anima) –> false (is not in list of lexemes)
Hope this explains it
good luck with your project
(see also https://www.postgresql.org/docs/current/textsearch-controls.html )
Here’s postgres documentation that explains full text search with very helpful examples
Docs link
The stemmer truncates "animal" like this:
That explains why a search for
anima:*
fails (onea
too many).Searching for
ani:*
works just fine, different from what you say.Searching for
a:*
andan:*
finds nothing, since these words are stop words in English and get eliminated.What you see is that Full Text Search is not ideal for prefix search.