skip to Main Content

As described in the title, i am facing a strange error while creating an indexer (using both portal azure and Rest api).

{
    "error": {
        "code": "",
        "message": "Error with data source: Additional content found in JSON reference object. A JSON reference object should only have a $ref property. Path '$id'.  Please adjust your data source definition in order to proceed."
    }
}

datasource was created via the azure portal without specifying delete or change strategy.

JSON Structure in comosdb (MongoDb)
Post collection

{
  "_id": {
    "$oid": "....."
  },
  "author": {
    "$ref": "user",
    "$id": {
      "$oid": "...."
    }
  },
  "_class": "com.community.domain.Post"
}

bellow the indexer definition

{
"dataSourceName": "fshco-post",
"targetIndexName": "index",
"fieldMappings": [
{
"sourceFieldName": "_class",
"targetFieldName": "class"
}

    ],
    "parameters": {
        "batchSize": 1000,
        "maxFailedItems": null,
        "maxFailedItemsPerBatch": null
    }

}

To confirm that the problem is the $ref attribute.I have used one collection Post containing one document but without the child attribute $ref in the author field, and it was indexed succesfully.

I have tried the skillset **ShaperSkill **to modify the $ref naming, but also didnt work with the same error.
After that, I understand that the problem is probably in the cracking data phase before the skillset execution phase.
indexing phases

bellow the definition skillset that i have used:

 {
      "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
      "name": "#1",
      "description": null,
      "context": "/document",
      "inputs": [
        {
          "name": "refto",
          "source": "/document/author/$ref"
        },
        {
          "name": "id",
          "source": "/document/author/$id"
        }
      ],
      "outputs": [
        {
          "name": "output",
          "targetName": "post_author"  --> same name as the index attribute
        }
      ]
    }
  ]

In the Indexer

    "skillsetName": "skillpostshaper",
    "outputFieldMappings": [
        {
            "sourceFieldName": "/document/post_author",
            "targetFieldName": "post_author"
        }
    ],

Is there anything obvious that I’ve missed?

3

Answers


  1. Chosen as BEST ANSWER

    This the Microsoft team answer of this question:

    ... It seems that mongo db uses $ref, $id, and $db to create references to other documents in other collections. Newtonsoft uses $ref and $id for self-referential json. We're throwing an error because newtonsoft is trying to deserialize the $ref object to something that doesn't exist (presumably because it's located in a separate doc in a seperate mongo db collection).

    Our indexer is not configured to follow document references in mongo, so the best we could do is allow those system fields to be indexed as plain text (via MetadataPropertyHandling.Ignore). At the least, I can update the deserializer to not attempt to deserialize self-referential json for Mongo collections because presumably such a document would conflict with the system mongo fields. This requires a code fix and may take a few months to roll out to the customer.

    I do not think this meets our bar for a hotfix, since it's not affecting any production services. But we will add it into our backlog track.
    In the immediate term, I'm afraid there's no work around other than removing the column from the datasource, if possible. Unfortunately, mongo db indexers don't support custom queries at present.

    Documentation updated : https://learn.microsoft.com/en-us/azure/search/search-howto-index-cosmosdb-mongodb#limitations

    • The MongoDB attribute $ref is a reserved word. If you need this in your MongoDB collection, consider alternative solutions for populating an index.

  2. AFAIK, index field name should not be started with special characters as mentioned here
    enter image description here
    Using field mappings in indexer, I have done one field to another field and below are steps i followed,

    1. created data source, index and indexer.
    2. Added new filed in index with name ref.
    3. In indexer, added field mapping as shown below. Here mapping ref field to the existing field with name HotelName.
      enter image description here
    4. Once ran indexer, able to get data in ref field.
      enter image description here

    Try to modify the data in the data source before indexing it, for example by removing the "$ref" property or renaming it to a different field name, so that it can be handled by the indexer.

    Login or Signup to reply.
  3. The error you are facing is related to the use you are giving to $ref in a JSON file and not related to the indexer per se. The $ref keyword in JSON files is a "reserved word" that can only be used to point to a reference. As suggested as part of the comments, modifying the field name in the data source before indexing it, by removing the $ref property or renaming it to a different field name, should let the indexer read the JSON file in the way you expect.

    Here is the reference to JSON documentation for more information.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search