skip to Main Content

I’m trying to add a new endpoint that does full-text search with AND, OR, NOT operators and also tolerates typos with TriagramSimilarity.

I came across this question: Combine trigram with ranked searching in django 1.10 and was trying to use that approach but SearchRank is not behaving as I’d expect, and I’m confused about how it works.

When my code looks like the basic implementation of full-text search the negative filter is working fine

    @action(detail=False, methods=["get"])
    def search(self, request, *args, **kwargs):
        search_query = request.query_params.get("search")
        vector = SearchVector("name", weight="A")
        query = SearchQuery(search_query, search_type="websearch")
        qs = Project.objects.annotate(
            search=vector,
        ).filter(
            search=query,
        )
        return Response({
            "results": qs.values()
        })

the returned documents

But I need to implement this using SearchRank so I can later do some logic with the rank score and the similarity score.

This is what my code looks like annotating for rank instead of using the tsvector annotation:

    @action(detail=False, methods=["get"])
    def search(self, request, *args, **kwargs):
        search_query = request.query_params.get("search")
        vector = SearchVector("name", weight="A")
        query = SearchQuery(search_query, search_type="websearch")
        rank = SearchRank(vector, query, cover_density=True)

        qs = Project.objects.annotate(
            rank=rank,
        ).order_by("-rank")
        return Response({
            "results": qs.values()
        })

And the response looks like:
The documents I got back

The rank given to the document named "APT29 Attack Graph" is 1. I’d expect the - operator would rank it lower, ideally 0.

Does SearchRank not take into consideration any search operators?

This is what the PostgreSQL looks like for the queryset

Sort (cost=37.78..37.93 rows=62 width=655)
  Sort Key: (ts_rank_cd(setweight(to_tsvector(COALESCE(name, ''::text)), 'A'::"char"), websearch_to_tsquery('apt29 -graph'::text))) DESC
  ->  Seq Scan on firedrill_project (cost=0.00..35.93 rows=62 width=655)

Also if there is a better way to do this kind of search without introducing new dependencies (Elasticsearch, haystack, etc) please reference it.

I tried different search operators. Looked for alternative ways to do this, I had no success so far.

2

Answers


  1. Django SearchRank does not take the search operators into account because it only calculates the rank based on how well the search query matches the documents.

    Lets use SearchQuery to filter the results based on the search operators and use TrigramSimilarity to calculate the similarity score.

    edit: now we takes into account both full-text search and trigram similarity

    from django.contrib.postgres.search import SearchQuery, SearchVector, SearchRank
    from django.contrib.postgres.aggregates import StringAgg
    from django.contrib.postgres.search import TrigramSimilarity
    from django.db.models import F
    
    class ProjectViewSet(viewsets.ModelViewSet):
        queryset = Project.objects.all()
        serializer_class = ProjectSerializer
    
        @action(detail=False, methods=["get"])
        def search(self, request, *args, **kwargs):
            search_query = request.query_params.get("search")
            vector = SearchVector("name", weight="A")
            query = SearchQuery(search_query, search_type="websearch")
    
            projects = Project.objects.annotate(
                rank=SearchRank(vector, query),
                similarity=TrigramSimilarity('name', search_query),
            )
    
            projects = projects.annotate(
                combined_score=F('rank') * F('similarity'),
            ).order_by('-combined_score')
    
            return Response({
                "results": projects.values()
            })
    
    Login or Signup to reply.
  2. I’ve already used the search rank with trigram similarity in the past.

    To filter the search results you can use the "websearch" syntax of the search query matching the search vector.

    Finally, you can sort the already filtered search results using a combination (e.g. sum) of the search rank and trigram similarity.

    @action(detail=False, methods=["get"])
    def search(self, request, *args, **kwargs):
        search_query = request.query_params.get("search")
        vector = SearchVector("name", weight="A")
        query = SearchQuery(search_query, search_type="websearch")
        rank = SearchRank(vector, query, cover_density=True)
        simimlarity = TrigramSimilarity("name", search_query)
        qs = Project.objects.annotate(
            search=vector,
            order=rank + simimlarity,
        ).filter(
            search=query,
        ).order_by("-order")
        return Response({"results": qs.values()})
    

    You can find another example of use in this old article of mine:

    Another example of this combination that you might be interested in is the Django site documentation search (which I wrote):

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search