Im trying to copy data from SQL Server to CosmosDb.
I have created a DataFlow to apply transformations and set the proper json schema for the items in the CosmosDb container.
The SQL Server data looks like the following:
Id | Name | Array |
---|---|---|
1 | Name1 | [{"property1":"value1","property2":"value2"},{"property1":"value1"}] |
2 | Name2 | [{"property1":"value1","property2":"value2"},{"property1":"value1"}] |
The CosmosDb schema should be like the following:
{
"id" : "1",
"name" : "Name1",
"array" : [
{
"property1":"value1",
"property2":"value2"
},
{
"property1":"value1"
}
]
}
I’ve used the Derived Column and Parse activities in the Data Flow, applying several functions in the expression builder, but I can’t find a proper solution for it.
Is there a straight forward way to parse the string array of jsons value in to an actual array of jsons without manipulating the string, for example using split or replace, in Azure Data Factory?
2
Answers
I could find a solution by myself. The steps are the following:
1.- Add a Parse activity
2.- Under Parse Settings select JSON format if not selected and under JSON settings select Array of documents
3.- Finally in the Output colum type to define a complex schema, you should define your schema like this:
To achieve your requirements, you can follow the below steps in ADF dataflow.
Step:1 Take the source transformation with SQL table as dataset.
Step:2 Take the derived column transformation and give the expression for Array column as
split(replace(replace(replace(Array,'[',''),']',''),'},{','}|{'),'|')
. This would split the whole string data as array data.Step:3 Take the flatten transformation and give the unroll by column as Array.
Step:4 Take Parse transformation to parse Json data as column.
output column type expression used:
(property1 as string, property2 as string)
Step:5 Take the aggregate transformation and group the data by
Id, Name
and give the expression for aggregate column asArray=collect(Array)
.Output for aggregate transformation:
Step:6 Add the sink transformation and give the cosmos db as sink dataset.
When you run the pipeline with the dataflow, data will be copied as
Array of Json
itself in cosmos db.