I’m trying to add a new endpoint that does full-text search with AND, OR, NOT
operators and also tolerates typos with TriagramSimilarity
.
I came across this question: Combine trigram with ranked searching in django 1.10 and was trying to use that approach but SearchRank
is not behaving as I’d expect, and I’m confused about how it works.
When my code looks like the basic implementation of full-text search the negative filter is working fine
@action(detail=False, methods=["get"])
def search(self, request, *args, **kwargs):
search_query = request.query_params.get("search")
vector = SearchVector("name", weight="A")
query = SearchQuery(search_query, search_type="websearch")
qs = Project.objects.annotate(
search=vector,
).filter(
search=query,
)
return Response({
"results": qs.values()
})
But I need to implement this using SearchRank
so I can later do some logic with the rank score and the similarity score.
This is what my code looks like annotating for rank instead of using the tsvector annotation:
@action(detail=False, methods=["get"])
def search(self, request, *args, **kwargs):
search_query = request.query_params.get("search")
vector = SearchVector("name", weight="A")
query = SearchQuery(search_query, search_type="websearch")
rank = SearchRank(vector, query, cover_density=True)
qs = Project.objects.annotate(
rank=rank,
).order_by("-rank")
return Response({
"results": qs.values()
})
The rank given to the document named "APT29 Attack Graph" is 1. I’d expect the -
operator would rank it lower, ideally 0.
Does SearchRank not take into consideration any search operators?
This is what the PostgreSQL looks like for the queryset
Sort (cost=37.78..37.93 rows=62 width=655)
Sort Key: (ts_rank_cd(setweight(to_tsvector(COALESCE(name, ''::text)), 'A'::"char"), websearch_to_tsquery('apt29 -graph'::text))) DESC
-> Seq Scan on firedrill_project (cost=0.00..35.93 rows=62 width=655)
Also if there is a better way to do this kind of search without introducing new dependencies (Elasticsearch, haystack, etc) please reference it.
I tried different search operators. Looked for alternative ways to do this, I had no success so far.
2
Answers
Django
SearchRank
does not take the search operators into account because it only calculates the rank based on how well the search query matches the documents.Lets use
SearchQuery
to filter the results based on the search operators and useTrigramSimilarity
to calculate the similarity score.edit: now we takes into account both full-text search and trigram similarity
I’ve already used the search rank with trigram similarity in the past.
To filter the search results you can use the "websearch" syntax of the search query matching the search vector.
Finally, you can sort the already filtered search results using a combination (e.g. sum) of the search rank and trigram similarity.
You can find another example of use in this old article of mine:
Another example of this combination that you might be interested in is the Django site documentation search (which I wrote):