I have an index on AI Search that contains one term in English (e.g. "white wine", "grapes", "chocolate cake", …) per document. I have a vector field. Indexing has run without problems for 100k documents.
My use case is to find the closest term to one entered by the user and give a score to the match (0-100%).
When I run the following query on Search Explorer on Azure Portal for my index:
{
"search": "Winery products",
"count": true,
"vectorQueries": [
{
"kind": "text",
"text": "Winery products",
"fields": "vectorTextEnglish"
}
]
}
I get the right results. Please note top score is 0.031:
{
"@odata.context": "https://me.search.windows.net/indexes('myindex')/$metadata#docs(*)",
"@odata.count": 75,
"@search.nextPageParameters": {
"select": "chunk_id,Term,MyReference,parent_id",
"count": true,
"skip": 50,
"vectorQueries": [
{
"kind": "text",
"k": null,
"oversampling": null,
"fields": "vectorTextEnglish",
"vector": [],
"text": "Winery products",
"url": null,
"base64Image": null,
"exhaustive": null,
"weight": null,
"filterOverride": null,
"threshold": null
}
]
},
"value": [
{
"@search.score": 0.0317540317773819,
"chunk_id": "xxxx",
"Term": "Alcoholic wines",
"MyReference": "00123",
"parent_id": "yyyyy"
},
{
"@search.score": 0.03159204125404358,
...
},
However, if I ask a random string asdfjiwefowfwe
I get a very similar score 0.030.
{
"@odata.context": "https://me.search.windows.net/indexes('myindex')/$metadata#docs(*)",
"@odata.count": 93,
"@search.nextPageParameters": {
"select": "chunk_id,Term,MyReference,parent_id",
"count": true,
"skip": 50,
"vectorQueries": [
{
"kind": "text",
"k": null,
"oversampling": null,
"fields": "vectorTextEnglish",
"vector": [],
"text": "asdfjiwefowfwe",
"url": null,
"base64Image": null,
"exhaustive": null,
"weight": null,
"filterOverride": null,
"threshold": null
}
]
},
"value": [
{
"@search.score": 0.03083491325378418,
"chunk_id": "xxxxxx",
"Term": "Ash",
"MyReference": "00422",
"parent_id": "yyyyy"
},
{
"@search.score": 0.029877368360757828,
...
},
I would like to normalize the score of the match from 0-100, but I don’t understand how does a random string get the same score as a good match. Anyone can help me understand and guide me how to give a higher score if the match is good and 0 for random strings?
I tried setting some thresholds, but since scores are so close to each other, it is impossible. I tried with semantic ranking but it is even more confusing, these random strings get 1.8 reranking score while a perfect match is perhaps 2.4.
2
Answers
To address the issue of similar scores in Azure AI Search when working with large datasets, you can add a scoring profile to the vector index.
I referred to this guide on creating a vector index in Azure AI Search.
Using scoring profiles enables you to apply weights to relevant fields.
For example, you might define a scoring profile that gives more weight to matches in specific fields, making them more influential than less relevant fields.
In your case, you should add weight to the
Term
field, as shown in the image below.The sample scoring profile, named
sampath
, assigns a weight of 5 to theTerm
field using a custom scoring function.JSON query:
Output:
Use the parameter
"debug": "all"
in your request. Then you will get in the response a new property like"vectorSimilarity": "0.998"
that goes from 0 to 1. In most cases you can then ignore keyword score, since vector search is very accurate. Semantic ranking is an overkill for most use cases.