I have for example the below JSON file:
[
{
"id": "document1",
"source": {
"pass": {
"ploy": {
"school": [{
"sourcen": "data"
},
{
"sourcen": "data1"
},
{
"sourcen": "data1"
}
]
}
}
}
},
{
"id": "document2",
"source": {
"pass": {
"ploy": {
"school": [{
"sourcen": "data1"
},
{
"sourcen": "data2"
},
{
"sourcen": "data3"
},
{
"sourcen": "data3"
}
]
}
}
}
}
]
and I want to Distinct inside the “school” vectors, so the output will be:
[
{
"$1": [
"data",
"data1"
]
},
{
"$1": [
"data1",
"data2",
"data3"
]
}
]
In CosmosDB ARRAY() function can be used to retrieve this data, but the application I am using does not support it.
I am running the below query now, but this is not correct as it distinct from all values together:
SELECT DISTINCT sourcen FROM [tablename.source.pass.ploy.school]
Could you please help? Is there any SQL query that outputs distinct data from different vectors?
2
Answers
You can group the data by the values in the "sourcen" field and then use the
STRING_AGG()
to concatenate the distinct values for each group.I don’t know what you mean by "but the application I am using does not support it.". Query goes in, document comes out 😉 Your application should not care how the query is executed in CosmosDB server and should accept any valid SQL query.
Could it be that you client app just failed to bind those automatic names like
$1
?Basically you can combine a
DISTINCT
subquery withARRAY()
function and provide explicit name to the constructed parts, like so:Results in documents like:
There’s decent documentation on how to work with arrays in cosmosDB SQL queries, including how to construct them in Working with arrays and objects in Azure Cosmos DB.