skip to Main Content

I have a CSV File in the Following format which want to copy from an external share to my datalake:

Test; Text
"1"; "This is a text
which goes on on a second line
and on on a third line"
"2"; "Another Test"

I do now want to load it with a Copy Data Task in an Azure Synapse Pipeline. The result is the following:

Test; Text
"1";" "This is a text"
"which goes on on a second line";
"and on on a third line"";
"2";" "Another Test""

So, yo see, it is not handling the Multi-Line Text correct. I also do not see an option to handle multiline text within a Copy Data Task. Unfortunately i’m not able to use a DataFlow Task, because it is not allowing to run with an external Azure Runtime, which i’m forced to use, due to security reasons.
In fact, i’m of course not speaking about this single test file, instead i do have x thousands of files.

My settings for the CSV File look like follows:

CSV Connector Settings

Can someone tell me how to handle this kind of multiline data correctly?
Do I have any other options within Synapse (apart from the Dataflows)?

Thanks a lot for your help

2

Answers


  1. Chosen as BEST ANSWER

    Well turns out this is not possible with a CSV File. The pragmatic solution is to use "binary" files instead, to transfer the CSV Files and only load and transform them later on with a Python Notebook in Synapse.


    • You can achieve this in azure data factory by iterating through all lines and check for delimiter in each line. And then, use string manipulation functions with set variable activities to convert multi-line data to a single line.

    • Look at the following example. I have a set variable activity with empty value (taken from parameter) for req variable.

    enter image description here

    • In lookup, create a dataset with following configuration to the multiline csv:

    enter image description here

    • In foreach, where I iterate each row by giving items value as @range(0,sub(activity('Lookup1').output.count,1)). Inside for each, I have an if activity with following condition:
    @contains(activity('Lookup1').output.value[item()]['Prop_0'],';')
    
    • If this is true, then I concat the current result to req variable using 2 set variable activities.
    temp: @if(contains(activity('Lookup1').output.value[add(item(),1)]['Prop_0'],';'),concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],decodeUriComponent('%0D%0A')),concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],' '))
    
    actual (req variable): @variables('val')
    

    enter image description here

    • For false, I have handled the concatenation in the following way:
    temp1: @concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],' ')
    
    actual1 (req variable): @variables('val2')
    

    enter image description here

    • Now, I have used a final variable to handle last line of the file. I have used the following dynamic content for that:
    @if(contains(last(activity('Lookup1').output.value)['Prop_0'],';'),concat(variables('req'),decodeUriComponent('%0D%0A'),last(activity('Lookup1').output.value)['Prop_0']),concat(variables('req'),last(activity('Lookup1').output.value)['Prop_0']))
    

    enter image description here

    • Finally, I have taken copy data activity with a sample source file with 1 column and 1 row (using this to copy our actual data).

    enter image description here

    • Now, take source file configuration as shown below:

    enter image description here

    • Create an additional column with value as final variable value:

    enter image description here

    • Create a sink with following configuration and select mapping for only above created column:

    enter image description here

    • When I run the pipeline, I get the data as required. The following is an output image for reference.

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search