skip to Main Content

In Azure Synapse pipeline I adjust loading from stage table to DWH on Azure Synapse SQL dedicated pool.
I have a source table where one column has xml data stored in the column as text (nvarchar(max)).
I need parse this xml from every row in the set of columns and rows and load into Azure Synapse SQL dedicated pool. The xml functions don’t apply here because Azure Synapse does not support xml.

Table is something like this:

source table

I need such result table:

needed result set

or such:

needed result set another

I tried to use Azure Synapse pipeline Data Flow Parse transformation, but for the third row returns only last element from xml (where CUSTOMERNO is 122 only).

Please could someone tell me how to parse the xml text into a set of rows?
Thanks in advance.

2

Answers


  1. I created a table with xml data type column in azure sql database and inserted values into that.
    students:

    CREATE TABLE dbo.students (
        studentId       INT NOT NULL PRIMARY KEY,
        studentName     VARCHAR(20) NOT NULL,
        request         XML NOT NULL
        )
    
    insert into students(studentId,studentName,request) 
    values
         (1,'XXX','<Customers><row><CUSTOMERNO>100</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>[email protected]</EMAIL></row> </Customers>'),
         (2,'YYY','<Customers><row><CUSTOMERNO>101</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>[email protected]</EMAIL></row></Customers>'),
         (3,'ZZZ','<Customers><row><CUSTOMERNO>12</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>[email protected]</EMAIL></row><row>
                   <CUSTOMERNO>947</CUSTOMERNO><OPERATION>UPDATE</OPERATION><EMAIL>[email protected]</EMAIL></row><row>
                   <CUSTOMERNO>947</CUSTOMERNO><OPERATION>DELETE</OPERATION><EMAIL>[email protected]</EMAIL></row></Customers>');
    

    Image for reference:

    enter image description here

    I created another table and retrieve the data from the xml data type column into that table using below code

    CREATE TABLE dbo.studentTable (
        studentId       INT NOT NULL,
        studentName     VARCHAR(20) NOT NULL,
        customerno      INT NOT NULL,
        operation       VARCHAR(20) NOT NULL,
        email           VARCHAR(100) NOT NULL
        )
    INSERT INTO dbo.studentTable ( studentId,studentName, customerno, operation, email )
    SELECT
        s.studentId,
        s.studentName,
        c.c.value( '(CUSTOMERNO/text())[1]', 'INT' ) customerno,
        c.c.value( '(OPERATION/text())[1]', 'VARCHAR(20)' ) operation,
        c.c.value( '(EMAIL/text())[1]', 'VARCHAR(100)' ) email  
    FROM dbo.students s
        CROSS APPLY s.request.nodes('Customers/row') c(c)
    

    Output of table:

    enter image description here

    I did this in azure SQL database because azure synapse dedicated pool is not supported for xml data type.
    we can copy above table to azure synapse dedicated pool using azure synapse pipeline copy activity.
    I created dedicated SQL pool in azure synapse and created pipeline and performed copy activity using below procedure:
    created azure SQL database dataset using azure SQL database linked service.
    azure SQL database linked service:

    enter image description here

    Azure SQL database dataset:

    enter image description here

    Source of copy activity:

    enter image description here

    I created table in SQL pool- using below code:

    CREATE  TABLE dbo.studentTable (
           studentId INT  NOT  NULL,
           studentName VARCHAR(20) NOT  NULL,
           customerno INT  NOT  NULL,
           operation  VARCHAR(20) NOT  NULL,
           email VARCHAR(100) NOT  NULL
    )
    

    Creating SQL dedicated pool linked service:
    search for synapse dedicated pool in linked service options.
    Image for reference:

    enter image description here

    select and click on continue and fill required details and click on OK.
    Image for reference:

    enter image description here

    I gave sink as synapse dedicated pool database by enabling bulk insert option.

    enter image description here

    I debug the pipeline It run successfully.
    Image for reference:

    enter image description here

    The table is copied successfully into dedicated pool.
    Image for reference:
    enter image description here

    Login or Signup to reply.
  2. I think what @Amar has called out may work out if you want to use Azure SQL , but otherwise since you are already using the Synapse , you will have to explore SPARK .
    https://kontext.tech/article/1091/extract-value-from-xml-column-in-pyspark-dataframe

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search