I have a CSV File in the Following format which want to copy from an external share to my datalake:
Test; Text
"1"; "This is a text
which goes on on a second line
and on on a third line"
"2"; "Another Test"
I do now want to load it with a Copy Data Task in an Azure Synapse Pipeline. The result is the following:
Test; Text
"1";" "This is a text"
"which goes on on a second line";
"and on on a third line"";
"2";" "Another Test""
So, yo see, it is not handling the Multi-Line Text correct. I also do not see an option to handle multiline text within a Copy Data Task. Unfortunately i’m not able to use a DataFlow Task, because it is not allowing to run with an external Azure Runtime, which i’m forced to use, due to security reasons.
In fact, i’m of course not speaking about this single test file, instead i do have x thousands of files.
My settings for the CSV File look like follows:
Can someone tell me how to handle this kind of multiline data correctly?
Do I have any other options within Synapse (apart from the Dataflows)?
Thanks a lot for your help
2
Answers
Well turns out this is not possible with a CSV File. The pragmatic solution is to use "binary" files instead, to transfer the CSV Files and only load and transform them later on with a Python Notebook in Synapse.
You can achieve this in azure data factory by iterating through all lines and check for delimiter in each line. And then, use string manipulation functions with set variable activities to convert multi-line data to a single line.
Look at the following example. I have a set variable activity with empty value (taken from parameter) for
req
variable.@range(0,sub(activity('Lookup1').output.count,1))
. Inside for each, I have anif activity
with following condition:req
variable using 2 set variable activities.