Copy CSV File with Multiline Attribute with Azure Synapse Pipeline

Andreas
December 21, 2022
166 views
1 vote
2 Answers

I have a CSV File in the Following format which want to copy from an external share to my datalake:

Test; Text
"1"; "This is a text
which goes on on a second line
and on on a third line"
"2"; "Another Test"

I do now want to load it with a Copy Data Task in an Azure Synapse Pipeline. The result is the following:

Test; Text
"1";" "This is a text"
"which goes on on a second line";
"and on on a third line"";
"2";" "Another Test""

So, yo see, it is not handling the Multi-Line Text correct. I also do not see an option to handle multiline text within a Copy Data Task. Unfortunately i’m not able to use a DataFlow Task, because it is not allowing to run with an external Azure Runtime, which i’m forced to use, due to security reasons.
In fact, i’m of course not speaking about this single test file, instead i do have x thousands of files.

My settings for the CSV File look like follows:

CSV Connector Settings

Can someone tell me how to handle this kind of multiline data correctly?
Do I have any other options within Synapse (apart from the Dataflows)?

Thanks a lot for your help

Answers

Chosen as BEST ANSWER
- Andreas
- December 22, 2022 at 11:08 am
- 0 votes
0
Well turns out this is not possible with a CSV File. The pragmatic solution is to use "binary" files instead, to transfer the CSV Files and only load and transform them later on with a Python Notebook in Synapse.

(Edit)

- SaideepArikontham
- December 27, 2022 at 10:47 am
- 0 votes
0
- You can achieve this in azure data factory by iterating through all lines and check for delimiter in each line. And then, use string manipulation functions with set variable activities to convert multi-line data to a single line.
- Look at the following example. I have a set variable activity with empty value (taken from parameter) for req variable.
- In lookup, create a dataset with following configuration to the multiline csv:
- In foreach, where I iterate each row by giving items value as @range(0,sub(activity('Lookup1').output.count,1)). Inside for each, I have an if activity with following condition:
```
@contains(activity('Lookup1').output.value[item()]['Prop_0'],';')
```
- If this is true, then I concat the current result to req variable using 2 set variable activities.
```
temp: @if(contains(activity('Lookup1').output.value[add(item(),1)]['Prop_0'],';'),concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],decodeUriComponent('%0D%0A')),concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],' '))

actual (req variable): @variables('val')
```
- For false, I have handled the concatenation in the following way:
```
temp1: @concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],' ')

actual1 (req variable): @variables('val2')
```
- Now, I have used a final variable to handle last line of the file. I have used the following dynamic content for that:
```
@if(contains(last(activity('Lookup1').output.value)['Prop_0'],';'),concat(variables('req'),decodeUriComponent('%0D%0A'),last(activity('Lookup1').output.value)['Prop_0']),concat(variables('req'),last(activity('Lookup1').output.value)['Prop_0']))
```
- Finally, I have taken copy data activity with a sample source file with 1 column and 1 row (using this to copy our actual data).
- Now, take source file configuration as shown below:
- Create an additional column with value as final variable value:
- Create a sink with following configuration and select mapping for only above created column:
- When I run the pipeline, I get the data as required. The following is an output image for reference.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.