skip to Main Content

I have for example the below JSON file:

[
   {
        "id": "document1",
        "source": {
            "pass": {
                "ploy": {
                    "school": [{
                            "sourcen": "data"
                        },
                        {
                            "sourcen": "data1"
                        },
                        {
                            "sourcen": "data1"
                        }
                    ]
                }
            }
        }
    },
    {
        "id": "document2",
        "source": {
            "pass": {
                "ploy": {
                    "school": [{
                            "sourcen": "data1"
                        },
                        {
                            "sourcen": "data2"
                        },
                        {
                            "sourcen": "data3"
                        },
                        {
                            "sourcen": "data3"
                        }
                    ]
                }
            }
        }
    }
]

and I want to Distinct inside the “school” vectors, so the output will be:

[
  {
   "$1": [
     "data",
     "data1"
    ]
},
  {
  "$1": [
    "data1",
    "data2",
    "data3"
      ]
  }
]

In CosmosDB ARRAY() function can be used to retrieve this data, but the application I am using does not support it.

I am running the below query now, but this is not correct as it distinct from all values together:
SELECT DISTINCT sourcen FROM [tablename.source.pass.ploy.school]

Could you please help? Is there any SQL query that outputs distinct data from different vectors?

2

Answers


  1. You can group the data by the values in the "sourcen" field and then use the STRING_AGG() to concatenate the distinct values for each group.

    SELECT STRING_AGG(DISTINCT s, ',') AS $1
    FROM c
    JOIN s IN c.source.pass.ploy.school
    GROUP BY s.sourcen
    
    Login or Signup to reply.
  2. I don’t know what you mean by "but the application I am using does not support it.". Query goes in, document comes out 😉 Your application should not care how the query is executed in CosmosDB server and should accept any valid SQL query.

    Could it be that you client app just failed to bind those automatic names like $1?

    Basically you can combine a DISTINCT subquery with ARRAY() function and provide explicit name to the constructed parts, like so:

    SELECT c.id,
        ARRAY (select distinct value p.sourcen 
            from p in c.source.pass.ploy.school
        ) as sourcen
    FROM c
    

    Results in documents like:

    [
        {
            "id": "test",
            "sourcen": [
                "data",
                "data1"
            ]
        }
    ]
    

    There’s decent documentation on how to work with arrays in cosmosDB SQL queries, including how to construct them in Working with arrays and objects in Azure Cosmos DB.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search