skip to Main Content

I will start with complain; I don’t know why MS has to rush this but the Azure Search AI most features seems to be started but not completed or at least there is no well written documentation. My team is working on to have a search service for our big internal data and I am experimenting with Azure search. One of the things I am experimenting is Custom Entity Lookup skill. As far as the inline definition, it works fine. However, since I have custom entity definition more than 10KB, I want to define my custom entities in external file and I did define one and put it in the azure blob, and set the URL from the blob file to entitiesDefinitionUri of the skill . However, I get the following error when I run the indexer
"
Failed to parse Custom Entity Definition file at https://xxx.blob.core.windows.net/search-experiment-data-container/guideline/terms.json as json."

I define my file as

[
   {
    "id": "someId",
    "name": "Some defintion",
    "description": "Some description",
    "fuzzyEditDistance": 3,
    "defaultCaseSensitive": false

   }
]

The documentation I looked at https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-custom-entity-lookup#sample-skill-definition shows the file as a csv (screenshot below) without giving an example how this file looks like. Has anyone successfully able to use external file for custom entity lookup definition? Is there auth/autho mechanism to access the file in the blob from the skills?

enter image description here

[EDIT]

There is also a very stupid implementation of conflict between reading a data source and reading custom entity definition in the indexer.

The "default" parsing mode is used for a variety of data formats, not just JSON. When you switch to this mode, the indexer may interpret the structure of the data source differently, which can cause issues if the data source is specifically formatted for JSON.

Data Sources and Custom Entity definitions should have been treated differently as they are completely used in different contexts. My data source for the indexer is a JSON and should be parsed as a json for I am mapping fields to my index. My Custom Entity definition is a JSON not cvs, but MS doc recommends to use default parsingMode for such when using Custom Entity lookup skills , which screws up everything because you cannot use default parsing mode for a json data source that you map to indexer

The parsing mode setting in an indexer applies to how the indexer processes the data source. The recommendation to use the "default" parsing mode for custom entity definitions in a skillset does screw up the requirement for a "json" parsing mode for JSON data sources.

So you cannot use JSON data source with custom entity skill

2

Answers


  1. Chosen as BEST ANSWER

    It looks a bug from Azure Search.

    There is an implementation conflict between reading a data source and reading custom entity definition in the indexer.

    The "default" parsing mode is used for a variety of data formats, not just JSON. When you switch to this mode, the indexer may interpret the structure of the data source differently, which can cause issues if the data source is specifically formatted for JSON.

    Data Sources and Custom Entity definitions should have been treated differently as they are completely used in different contexts. My data source for the indexer is a JSON and should be parsed as a JSON for I am mapping fields to my index. My Custom Entity definition is a JSON not cvs, but MS doc recommends to use default parsingMode for such when using Custom Entity lookup skills , which screws up everything because you cannot use default parsing mode for a json data source that you map to indexer

    The parsing mode setting in an indexer applies to how the indexer processes the data source. The recommendation to use the "default" parsing mode for custom entity definitions in a skillset does screw up the requirement for a "json" parsing mode for JSON data sources.

    So you cannot use JSON data source with custom entity skill.


  2. The .csv file format supports one entity per line, with aliases following the entity. You can find more documentation for it and the JSON files here.

    I don’t immediately see anything invalid with the JSON sample you provided.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search