I have files that are uploaded into an onprem folder daily, from there I have a pipeline pulling it to a blob storage container (input), from there I have another pipeline from blob (input) to blob (output), here is were the dataflow is, between those two blobs. Finally, I have output linked to sql. However, I want the blob to blob pipeline to pull only the file that was uploaded that day and run through the dataflow. The way I have it setup, every time the pipeline runs, it doubles my files. I’ve attached images below
[![Blob to Blob Pipeline][1]][1]Please let me know if there is anything else that would make this more clear
[1]: https://i.stack.imgur.com/24Uky.png
2
Answers
I was able to solve this by selecting "Delete source files" in dataflow. This way the the first pipeline pulls the new daily report into the input, and when the second pipeline (with the dataflow) pulls the file from input to output, it deletes the file in input, hence not allowing it to duplicate
To achieve above scenario, you can use
Filter by last Modified date
by passing the dynamic content as below:@startOfDay(utcnow())
: It will take start of the day for the current timestamp.@utcnow()
: It will take current timestamp.Input and Output of Get metadata activity: (Its filtering file for that day only)
If the files are multiple for particular day, then you have to use for each activity and pass the output of Get metadata activity to foreach activity as
Then add Dataflow activity in Foreach and create source dataset with filename parameter
Give filename parameter which is created as dynamic value in filename
And then pass source parameter filename as
@item().name
It will run dataflow for each file get metadata is returning.