In Azure Synapse pipeline I adjust loading from stage table to DWH on Azure Synapse SQL dedicated pool.
I have a source table where one column has xml data stored in the column as text (nvarchar(max)).
I need parse this xml from every row in the set of columns and rows and load into Azure Synapse SQL dedicated pool. The xml functions don’t apply here because Azure Synapse does not support xml.
Table is something like this:
I need such result table:
or such:
I tried to use Azure Synapse pipeline Data Flow Parse transformation, but for the third row returns only last element from xml (where CUSTOMERNO is 122 only).
Please could someone tell me how to parse the xml text into a set of rows?
Thanks in advance.
2
Answers
I created a table with xml data type column in azure sql database and inserted values into that.
students:
Image for reference:
I created another table and retrieve the data from the xml data type column into that table using below code
Output of table:
I did this in azure SQL database because azure synapse dedicated pool is not supported for xml data type.
we can copy above table to azure synapse dedicated pool using azure synapse pipeline copy activity.
I created dedicated SQL pool in azure synapse and created pipeline and performed copy activity using below procedure:
created azure SQL database dataset using azure SQL database linked service.
azure SQL database linked service:
Azure SQL database dataset:
Source of copy activity:
I created table in SQL pool- using below code:
Creating SQL dedicated pool linked service:
search for synapse dedicated pool in linked service options.
Image for reference:
select and click on continue and fill required details and click on OK.
Image for reference:
I gave sink as synapse dedicated pool database by enabling bulk insert option.
I debug the pipeline It run successfully.
Image for reference:
The table is copied successfully into dedicated pool.
Image for reference:
I think what @Amar has called out may work out if you want to use Azure SQL , but otherwise since you are already using the Synapse , you will have to explore SPARK .
https://kontext.tech/article/1091/extract-value-from-xml-column-in-pyspark-dataframe