Clearly I have some misunderstandings about the topic. I would appreciate if you correct my mistakes.
So as explained in PostgreSQL documentation, We need to do Full-Text Searching instead of using simple textual search operators.
Suppose I have a blog application in Django.
Entry.objects.filter(body_text__search="Cheese")
The bottom line is we have "document"s which are our individual records in blog_post
field and a term "Cheese"
.
Individual documents are gonna be translated to something called "tsvector"(a vector of simplified words) and also a "tsquery" is created out of our term.
-
If I have no
SearchVectorField
field and noSearchVector
index:for every single record in
body_text
field, atsvector
is created and it’s checked against ourtsquery
, in failure, we continue to the next record. -
If I have
SearchVectorField
field but notSearchVector
index:that
tsvector
vector is stored inSearchVectorField
field. So the searching process is faster because we only check for match not creating tsvector anymore, but still we’re checking every single record one by one. -
If I have both
SearchVectorField
field andSearchVector
index:a GIN index is created in database, it’s somehow like a dictionary:
"cat": [3, 7, 18], ...
. It stores the occurrences of the "lexems"(words) so that we don’t have to iterate through all the records in the database. I think this is the fastest option. -
Now if I have only
SearchVector
index:we have all the benefits of number 3.
Then why should I have SearchVectorField
field in my table? IOW why do I need to store tsvector
if I already have it indexed?
Django documentation says:
If this approach becomes too slow, you can add a
SearchVectorField
to your model.
Thanks in advance.
2
Answers
Use No
SearchVectorField
and noSearchVector
index when:Use
SearchVectorField
withoutSearchVector
index when:Use
SearchVectorField
withSearchVector
index when:ONLY use
SearchVector
index when:Edit:
Yes, your statements were correct in describing the different scenarios and their implications.
In the case of your question, from a speed perspective, having just a
SearchVector
index can indeed be sufficient in terms of search performance. TheSearchVector
index allows for efficient searching by leveraging the index structure, which speeds up the search process by avoiding the need to iterate through all the records.In this case, having a
SearchVectorField
is not strictly necessary for achieving good search performance. The primary benefit of theSearchVectorField
is that it allows you to store the tsvector values directly in the database, which can be useful if you need direct access to those values for other purposes. However, if you solely care about search speed and don’t need direct access to the tsvector values, having only theSearchVector
index is sufficient.Hoped this helped.
I assume an index without a field on the Django side means that it has a functional index. That is fine if your work_mem is large enough to hold the bitmap and you are only doing simple searches like single-word or
&&
. But if you are going proximity searches like<->
, or if your work_mem is too small, it will need to do "rechecks" on potential matches, and that means the document would need to be parsed again.