As described in the title, i am facing a strange error while creating an indexer (using both portal azure and Rest api).
{
"error": {
"code": "",
"message": "Error with data source: Additional content found in JSON reference object. A JSON reference object should only have a $ref property. Path '$id'. Please adjust your data source definition in order to proceed."
}
}
datasource was created via the azure portal without specifying delete or change strategy.
JSON Structure in comosdb (MongoDb)
Post collection
{
"_id": {
"$oid": "....."
},
"author": {
"$ref": "user",
"$id": {
"$oid": "...."
}
},
"_class": "com.community.domain.Post"
}
bellow the indexer definition
{
"dataSourceName": "fshco-post",
"targetIndexName": "index",
"fieldMappings": [
{
"sourceFieldName": "_class",
"targetFieldName": "class"
}
],
"parameters": {
"batchSize": 1000,
"maxFailedItems": null,
"maxFailedItemsPerBatch": null
}
}
To confirm that the problem is the $ref attribute.I have used one collection Post containing one document but without the child attribute $ref in the author field, and it was indexed succesfully.
I have tried the skillset **ShaperSkill **to modify the $ref naming, but also didnt work with the same error.
After that, I understand that the problem is probably in the cracking data phase before the skillset execution phase.
indexing phases
bellow the definition skillset that i have used:
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "#1",
"description": null,
"context": "/document",
"inputs": [
{
"name": "refto",
"source": "/document/author/$ref"
},
{
"name": "id",
"source": "/document/author/$id"
}
],
"outputs": [
{
"name": "output",
"targetName": "post_author" --> same name as the index attribute
}
]
}
]
In the Indexer
"skillsetName": "skillpostshaper",
"outputFieldMappings": [
{
"sourceFieldName": "/document/post_author",
"targetFieldName": "post_author"
}
],
Is there anything obvious that I’ve missed?
3
Answers
This the Microsoft team answer of this question:
... It seems that mongo db uses $ref, $id, and $db to create references to other documents in other collections. Newtonsoft uses $ref and $id for self-referential json. We're throwing an error because newtonsoft is trying to deserialize the $ref object to something that doesn't exist (presumably because it's located in a separate doc in a seperate mongo db collection).
Our indexer is not configured to follow document references in mongo, so the best we could do is allow those system fields to be indexed as plain text (via MetadataPropertyHandling.Ignore). At the least, I can update the deserializer to not attempt to deserialize self-referential json for Mongo collections because presumably such a document would conflict with the system mongo fields. This requires a code fix and may take a few months to roll out to the customer.
I do not think this meets our bar for a hotfix, since it's not affecting any production services. But we will add it into our backlog track.
In the immediate term, I'm afraid there's no work around other than removing the column from the datasource, if possible. Unfortunately, mongo db indexers don't support custom queries at present.
Documentation updated : https://learn.microsoft.com/en-us/azure/search/search-howto-index-cosmosdb-mongodb#limitations
AFAIK, index field name should not be started with special characters as mentioned here
Using field mappings in indexer, I have done one field to another field and below are steps i followed,
Try to modify the data in the data source before indexing it, for example by removing the "$ref" property or renaming it to a different field name, so that it can be handled by the indexer.
The error you are facing is related to the use you are giving to
$ref
in a JSON file and not related to the indexer per se. The$ref
keyword in JSON files is a "reserved word" that can only be used to point to a reference. As suggested as part of the comments, modifying the field name in the data source before indexing it, by removing the$ref
property or renaming it to a different field name, should let the indexer read the JSON file in the way you expect.Here is the reference to JSON documentation for more information.