I am trying to delete folders that are older than 7 days from ADLS. The folders I want to delete have date as name and are under subfolders of the main folder. Please see below example for folder structure (raw is my container)
/raw/MainFolder/SubfolderA/20230430/File.csv
/raw/MainFolder/SubfolderA/20230415/File.csv
/raw/MainFolder/SubfolderA/20230410/File.csv
/raw/MainFolder/SubfolderB/20230430/File.csv
/raw/MainFolder/SubfolderB/20230420/File.csv
/raw/MainFolder/SubfolderB/20230405/File.csv
I need to delete all the folders that have the date as name and is more than 7 days old.
In this example the pipeline should delete 2 folders from SubFolderA (20230415, 20230410) and 2 folders from SubFolderB (20230420, 20230405).
How do I create a pipeline in Azure Data Factory that dynamically deletes these old folders using a GetMetadata and Delete Activity?
I created a pipeline with GetMetadata activity and am able to see the folders under MainFolder in debug output. But I need help with making it dynamic so that it will fetch all the folders under the subfolders and creating the delete activity.
2
Answers
You can try using the following approach:
Create a pipeline parameter ,say,
Iteration
withInt
datatype and provide default value as -7 .Use ForEach activity and provide expression in the items as :
@createarray(0,1,2,3,4,5,6,7)
Inside foreach, use delete activity . In the source settings, point your dataset to
raw/MainFolder/
level and use wildcard file path, and expression like this:@concat(addDays(utcNow(),add(pipeline().parameters.iteration,item()),'subFolder*/yyyyMMdd*'),'A.parquet')
You can modify the above expression as per your file path .
I am able to achieve your requirement like below.
This is my folder structure:
As you want to delete the folders which are more than 7 days old, first I have created a dates array using a ForEach with
@range(0,7)
. This expression gives the array[0,1,2,3,4,5,6]
.Inside ForEach I have used append variable activity to an array to append the date in
yyyyMMdd
format with the below expression.This gives the dates array for the last 7 days list as below.
This is my pipeline flow:
Use a Get Meta data activity first to get the subfolders list(
SubfolderA,SubfolderA
) and pass this child items array to ForEach.@item().name
) to get the date folders list.Use the dataset with a parameter like below:
My Parent pipeline JSON:
Child Pipeline JSON:
Folders before pipeline execution:
You can see the folders which are more than 7 days folders were deleted after pipeline execution.