skip to Main Content

I have an index on AI Search that contains one term in English (e.g. "white wine", "grapes", "chocolate cake", …) per document. I have a vector field. Indexing has run without problems for 100k documents.

My use case is to find the closest term to one entered by the user and give a score to the match (0-100%).
When I run the following query on Search Explorer on Azure Portal for my index:

{
  "search": "Winery products",
  "count": true,
  "vectorQueries": [
    {
      "kind": "text",
      "text": "Winery products",
      "fields": "vectorTextEnglish"
    }
  ]
}

I get the right results. Please note top score is 0.031:

{
  "@odata.context": "https://me.search.windows.net/indexes('myindex')/$metadata#docs(*)",
  "@odata.count": 75,
  "@search.nextPageParameters": {
    "select": "chunk_id,Term,MyReference,parent_id",
    "count": true,
    "skip": 50,
    "vectorQueries": [
      {
        "kind": "text",
        "k": null,
        "oversampling": null,
        "fields": "vectorTextEnglish",
        "vector": [],
        "text": "Winery products",
        "url": null,
        "base64Image": null,
        "exhaustive": null,
        "weight": null,
        "filterOverride": null,
        "threshold": null
      }
    ]
  },
  "value": [
    {
      "@search.score": 0.0317540317773819,
      "chunk_id": "xxxx",
      "Term": "Alcoholic wines",
      "MyReference": "00123",
      "parent_id": "yyyyy"
    },
    {
      "@search.score": 0.03159204125404358,
      ...
    },

However, if I ask a random string asdfjiwefowfwe I get a very similar score 0.030.

{
  "@odata.context": "https://me.search.windows.net/indexes('myindex')/$metadata#docs(*)",
  "@odata.count": 93,
  "@search.nextPageParameters": {
    "select": "chunk_id,Term,MyReference,parent_id",    
    "count": true,
    "skip": 50,
    "vectorQueries": [
      {
        "kind": "text",
        "k": null,
        "oversampling": null,
        "fields": "vectorTextEnglish",
        "vector": [],
        "text": "asdfjiwefowfwe",
        "url": null,
        "base64Image": null,
        "exhaustive": null,
        "weight": null,
        "filterOverride": null,
        "threshold": null
      }
    ]
  },
  "value": [
    {
      "@search.score": 0.03083491325378418,
      "chunk_id": "xxxxxx",
      "Term": "Ash",
      "MyReference": "00422",
      "parent_id": "yyyyy"
    },
    {
      "@search.score": 0.029877368360757828,
      ...
    },

I would like to normalize the score of the match from 0-100, but I don’t understand how does a random string get the same score as a good match. Anyone can help me understand and guide me how to give a higher score if the match is good and 0 for random strings?

I tried setting some thresholds, but since scores are so close to each other, it is impossible. I tried with semantic ranking but it is even more confusing, these random strings get 1.8 reranking score while a perfect match is perhaps 2.4.

2

Answers


  1. To address the issue of similar scores in Azure AI Search when working with large datasets, you can add a scoring profile to the vector index.

    I referred to this guide on creating a vector index in Azure AI Search.

    enter image description here

    Using scoring profiles enables you to apply weights to relevant fields.

    For example, you might define a scoring profile that gives more weight to matches in specific fields, making them more influential than less relevant fields.

    I followed this documentation to add scoring profiles in Azure AI Search.

    In your case, you should add weight to the Term field, as shown in the image below.

    enter image description here

    The sample scoring profile, named sampath, assigns a weight of 5 to the Term field using a custom scoring function.

    "scoringProfiles": [
        {
            "name": "sampath",
            "functionAggregation": "sum",
            "text": {
                "weights": {
                    "Term": 5
                }
            },
            // "functions": [
            //   {
            //       "fieldName": "Term",
            //       "interpolation": "linear",
            //       "type": "tag",
            //       "boost": 5,
            //       "freshness": null,
            //       "magnitude": null,
            //       "distance": null,
            //       "tag": {
            //           "tagsParameter": "tag"
            //       }
            //   }
            // ]
        }
    ]
    
    
    

    JSON query:

    {
      "search": "Winery products",
      "scoringProfile": "boostRelevantMatches",
      "count": true
    }
    

    Output:
    enter image description here

    Login or Signup to reply.
  2. Use the parameter "debug": "all" in your request. Then you will get in the response a new property like "vectorSimilarity": "0.998" that goes from 0 to 1. In most cases you can then ignore keyword score, since vector search is very accurate. Semantic ranking is an overkill for most use cases.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search