Azure Synapse pipeline parse xml data to rowset

niko
December 26, 2022
286 views
0 votes
2 Answers

In Azure Synapse pipeline I adjust loading from stage table to DWH on Azure Synapse SQL dedicated pool.
I have a source table where one column has xml data stored in the column as text (nvarchar(max)).
I need parse this xml from every row in the set of columns and rows and load into Azure Synapse SQL dedicated pool. The xml functions don’t apply here because Azure Synapse does not support xml.

Table is something like this:

source table

I need such result table:

needed result set

or such:

needed result set another

I tried to use Azure Synapse pipeline Data Flow Parse transformation, but for the third row returns only last element from xml (where CUSTOMERNO is 122 only).

Please could someone tell me how to parse the xml text into a set of rows?
Thanks in advance.

Answers

- Bhavani
- December 27, 2022 at 1:50 pm
- 0 votes
0
I created a table with xml data type column in azure sql database and inserted values into that.
students:
```
CREATE TABLE dbo.students (
    studentId       INT NOT NULL PRIMARY KEY,
    studentName     VARCHAR(20) NOT NULL,
    request         XML NOT NULL
    )

insert into students(studentId,studentName,request) 
values
     (1,'XXX','<Customers><row><CUSTOMERNO>100</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>[email protected]</EMAIL></row> </Customers>'),
     (2,'YYY','<Customers><row><CUSTOMERNO>101</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>[email protected]</EMAIL></row></Customers>'),
     (3,'ZZZ','<Customers><row><CUSTOMERNO>12</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>[email protected]</EMAIL></row><row>
               <CUSTOMERNO>947</CUSTOMERNO><OPERATION>UPDATE</OPERATION><EMAIL>[email protected]</EMAIL></row><row>
               <CUSTOMERNO>947</CUSTOMERNO><OPERATION>DELETE</OPERATION><EMAIL>[email protected]</EMAIL></row></Customers>');
```
Image for reference:

I created another table and retrieve the data from the xml data type column into that table using below code
```
CREATE TABLE dbo.studentTable (
    studentId       INT NOT NULL,
    studentName     VARCHAR(20) NOT NULL,
    customerno      INT NOT NULL,
    operation       VARCHAR(20) NOT NULL,
    email           VARCHAR(100) NOT NULL
    )
INSERT INTO dbo.studentTable ( studentId,studentName, customerno, operation, email )
SELECT
    s.studentId,
    s.studentName,
    c.c.value( '(CUSTOMERNO/text())[1]', 'INT' ) customerno,
    c.c.value( '(OPERATION/text())[1]', 'VARCHAR(20)' ) operation,
    c.c.value( '(EMAIL/text())[1]', 'VARCHAR(100)' ) email  
FROM dbo.students s
    CROSS APPLY s.request.nodes('Customers/row') c(c)
```
Output of table:

I did this in azure SQL database because azure synapse dedicated pool is not supported for xml data type.
we can copy above table to azure synapse dedicated pool using azure synapse pipeline copy activity.
I created dedicated SQL pool in azure synapse and created pipeline and performed copy activity using below procedure:
created azure SQL database dataset using azure SQL database linked service.
azure SQL database linked service:

Azure SQL database dataset:

Source of copy activity:

I created table in SQL pool- using below code:
```
CREATE  TABLE dbo.studentTable (
       studentId INT  NOT  NULL,
       studentName VARCHAR(20) NOT  NULL,
       customerno INT  NOT  NULL,
       operation  VARCHAR(20) NOT  NULL,
       email VARCHAR(100) NOT  NULL
)
```
Creating SQL dedicated pool linked service:
search for synapse dedicated pool in linked service options.
Image for reference:

select and click on continue and fill required details and click on OK.
Image for reference:

I gave sink as synapse dedicated pool database by enabling bulk insert option.

I debug the pipeline It run successfully.
Image for reference:

The table is copied successfully into dedicated pool.
Image for reference:
Login or Signup to reply.

- HimanshuSinha
- January 3, 2023 at 11:17 pm
- 0 votes
0
I think what @Amar has called out may work out if you want to use Azure SQL , but otherwise since you are already using the Synapse , you will have to explore SPARK .
https://kontext.tech/article/1091/extract-value-from-xml-column-in-pyspark-dataframe

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.