How do I pull the last modified file with data flow in azure data factory?

danchdrezzing
November 30, 2022
168 views
1 vote
2 Answers

I have files that are uploaded into an onprem folder daily, from there I have a pipeline pulling it to a blob storage container (input), from there I have another pipeline from blob (input) to blob (output), here is were the dataflow is, between those two blobs. Finally, I have output linked to sql. However, I want the blob to blob pipeline to pull only the file that was uploaded that day and run through the dataflow. The way I have it setup, every time the pipeline runs, it doubles my files. I’ve attached images below

[![Blob to Blob Pipeline][1]][1]

Please let me know if there is anything else that would make this more clear
[1]: https://i.stack.imgur.com/24Uky.png

Answers

Chosen as BEST ANSWER
- danchdrezzing
- November 30, 2022 at 7:15 pm
- 0 votes
0
I was able to solve this by selecting "Delete source files" in dataflow. This way the the first pipeline pulls the new daily report into the input, and when the second pipeline (with the dataflow) pulls the file from input to output, it deletes the file in input, hence not allowing it to duplicate

(Edit)

- PratikLad
- December 1, 2022 at 8:01 am
- 0 votes
0
I want the blob to blob pipeline to pull only the file that was uploaded that day and run through the dataflow.

To achieve above scenario, you can use Filter by last Modified date by passing the dynamic content as below:
- @startOfDay(utcnow()) : It will take start of the day for the current timestamp.
- @utcnow() : It will take current timestamp.
Input and Output of Get metadata activity: (Its filtering file for that day only)

If the files are multiple for particular day, then you have to use for each activity and pass the output of Get metadata activity to foreach activity as
```
@activity('Get Metadata1').output.childItems
```
Then add Dataflow activity in Foreach and create source dataset with filename parameter

Give filename parameter which is created as dynamic value in filename

And then pass source parameter filename as @item().name

It will run dataflow for each file get metadata is returning.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.