Azure - DISTINCT values inside different vectors in a JSON file

ann
April 30, 2023
241 views
1 vote
2 Answers

I have for example the below JSON file:

[
   {
        "id": "document1",
        "source": {
            "pass": {
                "ploy": {
                    "school": [{
                            "sourcen": "data"
                        },
                        {
                            "sourcen": "data1"
                        },
                        {
                            "sourcen": "data1"
                        }
                    ]
                }
            }
        }
    },
    {
        "id": "document2",
        "source": {
            "pass": {
                "ploy": {
                    "school": [{
                            "sourcen": "data1"
                        },
                        {
                            "sourcen": "data2"
                        },
                        {
                            "sourcen": "data3"
                        },
                        {
                            "sourcen": "data3"
                        }
                    ]
                }
            }
        }
    }
]

and I want to Distinct inside the “school” vectors, so the output will be:

[
  {
   "$1": [
     "data",
     "data1"
    ]
},
  {
  "$1": [
    "data1",
    "data2",
    "data3"
      ]
  }
]

In CosmosDB ARRAY() function can be used to retrieve this data, but the application I am using does not support it.

I am running the below query now, but this is not correct as it distinct from all values together:
SELECT DISTINCT sourcen FROM [tablename.source.pass.ploy.school]

Could you please help? Is there any SQL query that outputs distinct data from different vectors?

Answers

- AVTUNEY
- May 1, 2023 at 8:03 am
- 0 votes
0
You can group the data by the values in the "sourcen" field and then use the STRING_AGG() to concatenate the distinct values for each group.
```
SELECT STRING_AGG(DISTINCT s, ',') AS $1
FROM c
JOIN s IN c.source.pass.ploy.school
GROUP BY s.sourcen
```
Login or Signup to reply.

- ImreP252hvel
- May 2, 2023 at 9:30 am
- 0 votes
0
I don’t know what you mean by "but the application I am using does not support it.". Query goes in, document comes out 😉 Your application should not care how the query is executed in CosmosDB server and should accept any valid SQL query.

Could it be that you client app just failed to bind those automatic names like $1?

Basically you can combine a DISTINCT subquery with ARRAY() function and provide explicit name to the constructed parts, like so:
```
SELECT c.id,
    ARRAY (select distinct value p.sourcen 
        from p in c.source.pass.ploy.school
    ) as sourcen
FROM c
```
Results in documents like:
```
[
    {
        "id": "test",
        "sourcen": [
            "data",
            "data1"
        ]
    }
]
```
There’s decent documentation on how to work with arrays in cosmosDB SQL queries, including how to construct them in Working with arrays and objects in Azure Cosmos DB.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Azure – DISTINCT values inside different vectors in a JSON file

Answers