I have to copy a file from an HTTP source to Azure Blob Storage (ABS) using a copy activity in Azure Data Factory (ADF).
The fully-qualified path to the file has a date-stamp in it, so it keeps changing (e.g., http://www.example.com/files/2022-12-13.zip
). Further I want to expand it into a directory in ABS that is also named based on the date (e.g., <blob>/2022-12-13/
).
Is there a way to do this in ADF (preferably one that doesn’t involve writing code)?
2
Answers
I had a similar requirement lately, and ended up solving this with code. You can either use an azure function to get the list of files from your data lake folder, or use a Synapse Notebook. Based on your requirements, you can select the latest/earliest/some other criterion in that specific blob –> folder. Here’s how I did it:
And then just call the function:
Since, your source is HTTP, you can build the URL dynamically like
http://www.example.com/files/yyyy-MM-dd.zip
whereyyyy-MM-dd
is today’s date.copy data
activity, create a source dataset forHTTP
source with configurations set as below. Give the base URL ashttp://www.example.com/files/
and relative URL as shown below:Don’t select
Preserve zip file name as folder
option.Now for sink, you can create your dataset for blob storage. Since you want to store it in a folder with
yyyy-MM-dd
, configure your blob storage sink dataset as shown below: