skip to Main Content

Searching for entries in a keyword-field stored in an Azure Cognitive Search index not returning expected results when searching for them in a long text. Mutli word tokens as ‘microsoft azure’ are not returned as a match when looking in the text "This text contains microft azure"

Working with Azure Cogntive Search with the Python SDK. Say I’m building a search index where each document in the index has a "name" field. The name field (which can consist of multiple words) only makes sense if I tokenize the entire name is one token, so I use the "keyword_v2" tokenizer when building the analyzer for this field.

from azure.search.documents.indexes.models import CustomAnalyzer

# Define the custom analyzer for the Name field
name_analyzer = CustomAnalyzer(name="name_analyzer",tokenizer_name="keyword_v2",
    token_filters=["lowercase"])`

# Specify the index schema

fields = [
        SimpleField(name="key", type=SearchFieldDataType.String, key=True),       
        SearchableField(name="name", type=SearchFieldDataType.String, analyzer_name="name_analyzer", searchable=True)
    ]

This works as expected when I test the analyzer using the REST API. As an example I have the following indexed entries in the name-field: [‘microsoft azure’, ‘amazon aws’, ‘google cloud’]. The custom analyzer I set up correctly tokenizes each entry as one token and not as multiple tokens (ex. ‘microsoft’ and ‘azure’).

The problem occurs when I search for the stored names in a text.

text_example = "This is a text containing microsoft azure."

results = search_client.search(search_text=text_example, include_total_count=True, select= ['name'], search_fields= ['name'], highlight_fields= 'name', query_type= "full")

print ('Total Documents Matching Query:', results.get_count())
for result in results:
    print(result)

I expect when I search for a name in the text_example it will return a hit on ‘microsoft azure’, but it doesn’t. It returns empty. I suspect because I use the same custom analyzer as both the index analyzer and search analyzer, it will tokenize the entire text_example as one token, which is not in the index. So it returns nothing.

Can I resolve this problem of searching for multiple word tokens in a long text in an efficient way using Azure Cogntive Search ?

2

Answers


  1. For Index Analyzer its fine to use keyword_v2 if you are using filtering on that field. But if you want to satisfy your use case, would rather suggest to use a standard analyzer for both index and search and then use a phrase search that requires term.
    So you text_example would be "This is a text containing "microsoft azure"."

    Login or Signup to reply.
  2. To resolve this issue, you can try using a different analyzer for the search query.

    You can create a custom analyzer that uses the standard_v2 tokenizer for the search query and apply it to the search text.

    Below is the update Analyzer and Schema code snippet:

    
    name_analyzer = CustomAnalyzer(name="name_analyzer", tokenizer_name="keyword_v2", token_filters=["lowercase"])
    
    
    fields = [
        SimpleField(name="key", type=SearchFieldDataType.String, key=True),
        SearchableField(name="name", type=SearchFieldDataType.String, indexAnalyzer="name_analyzer", searchAnalyzer="standard_v2", searchable=True),
    ]
    
    

    With the above schema I created an index and uploaded sample data:

    documents = [
        {"key": "1", "name": "microsoft azure"},
        {"key": "2", "name": "amazon aws"},
        {"key": "3", "name": "google cloud"},
    ]
    

    With above setup I was able to get the required results.

    Search Query Code:

    text_example = "This is a text containing microsoft azure."
    
    results = search_client.search(search_expression=f"name: '{text_example}'", include_total_count=True, select=['name'], search_fields=['name'], highlight_fields='name', query_type="full")
    
    print('Total Documents Matching Query:', results.get_count())
    for result in results:
        print(result)
    

    Result:
    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search