I’m currently working on a project where I need the data factory pipeline to copy based off the last run date.
The process breakdown….
- Data is ingested into a storage account
- The data ingested is in the directory format topic/yyyy/mm/dd i.e., multiple files being brought in a single directory hence it’s files are partitioned by date which looks like this day format and month and year etc
- The process currently filters based on the last high water mark date which updates each time the pipeline is run and triggers daily at 4am, once the copy is successful, a set variable increases the high-water mark value by 1 (I.e., one day), though files are not brought over on the weekends (this is the problem)
- The date value (HWM) will not increase if no files are brought over and will continue to loop through the same date.
- How to I get the pipeline to increase or look for the next file in that directory given that I use the HWV as the directory to the file, copy and update the HWM value only when completed dynamically. Current update logic
- current lookup of HWV lookup and directory path to copy files
2
Answers
Instead of adding 1 to last high water mark value, we can try to update current UTC as watermark value. So that, even when pipeline is not triggered data will be copied to the correct destination folder. I have tried to repro in my environment and below is the approach.
select * from tab1 where lastmodified > '@{activity('Lookup1').output.firstRow.watermark_value}'
@concat(formatDateTime(utcnow(),'yyyy'),'/', formatDateTime(utcnow(),'mm'),'/',formatDateTime(utcnow(),'dd'))
is given in folder path.
I think reading the post a couple of time , what I understood is
If I understand the ask correctly . please use the @dayOfWeek() function . Add a If statement and let the current logic only execute when the day of the week is Monday(2)-Friday(6) .
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expressions-usage#dayofweek