Azure Cognitive Search - Keyword search in long text - Not returning expected results

lexmwlab
August 7, 2023
97 views
0 votes
2 Answers

Searching for entries in a keyword-field stored in an Azure Cognitive Search index not returning expected results when searching for them in a long text. Mutli word tokens as ‘microsoft azure’ are not returned as a match when looking in the text "This text contains microft azure"

Working with Azure Cogntive Search with the Python SDK. Say I’m building a search index where each document in the index has a "name" field. The name field (which can consist of multiple words) only makes sense if I tokenize the entire name is one token, so I use the "keyword_v2" tokenizer when building the analyzer for this field.

from azure.search.documents.indexes.models import CustomAnalyzer

# Define the custom analyzer for the Name field
name_analyzer = CustomAnalyzer(name="name_analyzer",tokenizer_name="keyword_v2",
    token_filters=["lowercase"])`

# Specify the index schema

fields = [
        SimpleField(name="key", type=SearchFieldDataType.String, key=True),       
        SearchableField(name="name", type=SearchFieldDataType.String, analyzer_name="name_analyzer", searchable=True)
    ]

This works as expected when I test the analyzer using the REST API. As an example I have the following indexed entries in the name-field: [‘microsoft azure’, ‘amazon aws’, ‘google cloud’]. The custom analyzer I set up correctly tokenizes each entry as one token and not as multiple tokens (ex. ‘microsoft’ and ‘azure’).

The problem occurs when I search for the stored names in a text.

text_example = "This is a text containing microsoft azure."

results = search_client.search(search_text=text_example, include_total_count=True, select= ['name'], search_fields= ['name'], highlight_fields= 'name', query_type= "full")

print ('Total Documents Matching Query:', results.get_count())
for result in results:
    print(result)

I expect when I search for a name in the text_example it will return a hit on ‘microsoft azure’, but it doesn’t. It returns empty. I suspect because I use the same custom analyzer as both the index analyzer and search analyzer, it will tokenize the entire text_example as one token, which is not in the index. So it returns nothing.

Can I resolve this problem of searching for multiple word tokens in a long text in an efficient way using Azure Cogntive Search ?

Answers

- AnuragSrivastava
- August 7, 2023 at 12:26 pm
- 0 votes
0
For Index Analyzer its fine to use keyword_v2 if you are using filtering on that field. But if you want to satisfy your use case, would rather suggest to use a standard analyzer for both index and search and then use a phrase search that requires term.
So you text_example would be "This is a text containing "microsoft azure"."

Login or Signup to reply.

To resolve this issue, you can try using a different analyzer for the search query.

You can create a custom analyzer that uses the standard_v2 tokenizer for the search query and apply it to the search text.

Below is the update Analyzer and Schema code snippet:


name_analyzer = CustomAnalyzer(name="name_analyzer", tokenizer_name="keyword_v2", token_filters=["lowercase"])


fields = [
    SimpleField(name="key", type=SearchFieldDataType.String, key=True),
    SearchableField(name="name", type=SearchFieldDataType.String, indexAnalyzer="name_analyzer", searchAnalyzer="standard_v2", searchable=True),
]

With the above schema I created an index and uploaded sample data:

documents = [
    {"key": "1", "name": "microsoft azure"},
    {"key": "2", "name": "amazon aws"},
    {"key": "3", "name": "google cloud"},
]

With above setup I was able to get the required results.

Search Query Code:

text_example = "This is a text containing microsoft azure."

results = search_client.search(search_expression=f"name: '{text_example}'", include_total_count=True, select=['name'], search_fields=['name'], highlight_fields='name', query_type="full")

print('Total Documents Matching Query:', results.get_count())
for result in results:
    print(result)

Result:

Please signup or login to give your own answer.

Click here to cancel reply.

Azure Cognitive Search – Keyword search in long text – Not returning expected results

Answers