I am loading in .gzip files as binary to my raw container, and I am now wondering how to proceed in azure synapse analytics. I would like to get the binary .gzip and move it to a different folder and store it in Parquet format with the following steps.
- Transform the .gzip to the json format
- Transform the jsons to parquet
I am new to pipelines and not sure when to use copy data vs dataflow etc.
If someone could show the steps with print screens or very clearly it would be highly appreciated!
Thanks,
Anders
2
Answers
As per the Microsoft official document:
The Binary Dataset can only be used in
Copy activity
,GetMetadata activity
, orDelete activity
. We cannot use Binary Dataset inDataflow activity
. Also, when using Binary dataset, the service does not parse file content but treat it as it is.Now, even if you you using Binary dataset in
Copy activity
, you cannot transform it into other format but need to copy it from Binary dataset to binary dataset only.Therefore, you need to change your approach and try some programmatical method for your use case.
This is a common pattern we use, especially for larger ZIP files from SFTP which can take hours to download.