skip to Main Content

I am trying to delete folders that are older than 7 days from ADLS. The folders I want to delete have date as name and are under subfolders of the main folder. Please see below example for folder structure (raw is my container)

/raw/MainFolder/SubfolderA/20230430/File.csv
/raw/MainFolder/SubfolderA/20230415/File.csv
/raw/MainFolder/SubfolderA/20230410/File.csv

/raw/MainFolder/SubfolderB/20230430/File.csv
/raw/MainFolder/SubfolderB/20230420/File.csv
/raw/MainFolder/SubfolderB/20230405/File.csv

I need to delete all the folders that have the date as name and is more than 7 days old.
In this example the pipeline should delete 2 folders from SubFolderA (20230415, 20230410) and 2 folders from SubFolderB (20230420, 20230405).

How do I create a pipeline in Azure Data Factory that dynamically deletes these old folders using a GetMetadata and Delete Activity?

I created a pipeline with GetMetadata activity and am able to see the folders under MainFolder in debug output. But I need help with making it dynamic so that it will fetch all the folders under the subfolders and creating the delete activity.

2

Answers


  1. You can try using the following approach:

    • Create a pipeline parameter ,say, Iteration with Int datatype and provide default value as -7 .

    • Use ForEach activity and provide expression in the items as :

      @createarray(0,1,2,3,4,5,6,7)

    • Inside foreach, use delete activity . In the source settings, point your dataset to raw/MainFolder/ level and use wildcard file path, and expression like this:
      @concat(addDays(utcNow(),add(pipeline().parameters.iteration,item()),'subFolder*/yyyyMMdd*'),'A.parquet')

    You can modify the above expression as per your file path .

    Login or Signup to reply.
  2. I am able to achieve your requirement like below.

    This is my folder structure:

    raw
        MainFolder
            SubfolderA
                20230425
                    //files
                20230427
                    //files
                20230429
                    //files
                20230523
                    //files
            SubfolderB
                20230425
                    //files
                20230427
                    //files
                20230429
                    //files
                20230523
                    //files
                
    

    As you want to delete the folders which are more than 7 days old, first I have created a dates array using a ForEach with @range(0,7). This expression gives the array[0,1,2,3,4,5,6].

    Inside ForEach I have used append variable activity to an array to append the date in yyyyMMdd format with the below expression.

    @formatDateTime(subtractFromTime(utcNow(),item(),'Day'),'yyyyMMdd')
    

    This gives the dates array for the last 7 days list as below.

    enter image description here

    This is my pipeline flow:

    enter image description here

    Use a Get Meta data activity first to get the subfolders list(SubfolderA,SubfolderA) and pass this child items array to ForEach.

    • Inside ForEach, use another Get Meta data activity(in path give the @item().name) to get the date folders list.
    • Now, use filter on these child items. Here we are filtering the date folders by checking our dates array contains the folder name or not.
    • Get the child items which are more than 7 days from the filter. Here we need to iterate through this array. But ADF currently does not support nested Foreach. So, use Execute pipeline activity by passing the current subfolder name and its correspoding child items array.
    • In the child pipeline, iterate through the child items and use delete activity on it.

    Use the dataset with a parameter like below:

    enter image description here

    My Parent pipeline JSON:

    {
        "name": "parent",
        "properties": {
            "activities": [
                {
                    "name": "get subfolders",
                    "type": "GetMetadata",
                    "dependsOn": [
                        {
                            "activity": "ForEach1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "policy": {
                        "timeout": "0.12:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "dataset": {
                            "referenceName": "sourcecsv",
                            "type": "DatasetReference",
                            "parameters": {
                                "folderpath": "MainFolder"
                            }
                        },
                        "fieldList": [
                            "childItems"
                        ],
                        "storeSettings": {
                            "type": "AzureBlobFSReadSettings",
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    }
                },
                {
                    "name": "iterate subfolders",
                    "type": "ForEach",
                    "dependsOn": [
                        {
                            "activity": "get subfolders",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@activity('get subfolders').output.childItems",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "get date folders",
                                "type": "GetMetadata",
                                "dependsOn": [],
                                "policy": {
                                    "timeout": "0.12:00:00",
                                    "retry": 0,
                                    "retryIntervalInSeconds": 30,
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "userProperties": [],
                                "typeProperties": {
                                    "dataset": {
                                        "referenceName": "sourcecsv",
                                        "type": "DatasetReference",
                                        "parameters": {
                                            "folderpath": {
                                                "value": "@concat('MainFolder/',item().name)",
                                                "type": "Expression"
                                            }
                                        }
                                    },
                                    "fieldList": [
                                        "childItems"
                                    ],
                                    "storeSettings": {
                                        "type": "AzureBlobFSReadSettings",
                                        "enablePartitionDiscovery": false
                                    },
                                    "formatSettings": {
                                        "type": "DelimitedTextReadSettings"
                                    }
                                }
                            },
                            {
                                "name": "Execute Pipeline1",
                                "type": "ExecutePipeline",
                                "dependsOn": [
                                    {
                                        "activity": "Filter1",
                                        "dependencyConditions": [
                                            "Succeeded"
                                        ]
                                    }
                                ],
                                "userProperties": [],
                                "typeProperties": {
                                    "pipeline": {
                                        "referenceName": "child",
                                        "type": "PipelineReference"
                                    },
                                    "waitOnCompletion": true,
                                    "parameters": {
                                        "date_folder": {
                                            "value": "@activity('Filter1').output.value",
                                            "type": "Expression"
                                        },
                                        "path": {
                                            "value": "@concat('MainFolder/',item().name)",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            },
                            {
                                "name": "Filter1",
                                "type": "Filter",
                                "dependsOn": [
                                    {
                                        "activity": "get date folders",
                                        "dependencyConditions": [
                                            "Succeeded"
                                        ]
                                    }
                                ],
                                "userProperties": [],
                                "typeProperties": {
                                    "items": {
                                        "value": "@activity('get date folders').output.childItems",
                                        "type": "Expression"
                                    },
                                    "condition": {
                                        "value": "@not(contains(variables('daysarr'),item().name))",
                                        "type": "Expression"
                                    }
                                }
                            }
                        ]
                    }
                },
                {
                    "name": "ForEach1",
                    "type": "ForEach",
                    "dependsOn": [],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@range(0,7)",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "Append variable1",
                                "type": "AppendVariable",
                                "dependsOn": [],
                                "userProperties": [],
                                "typeProperties": {
                                    "variableName": "daysarr",
                                    "value": {
                                        "value": "@formatDateTime(subtractFromTime(utcNow(),item(),'Day'),'yyyyMMdd')",
                                        "type": "Expression"
                                    }
                                }
                            }
                        ]
                    }
                }
            ],
            "variables": {
                "counter": {
                    "type": "String"
                },
                "daysarr": {
                    "type": "Array"
                },
                "temp": {
                    "type": "String"
                },
                "new": {
                    "type": "Array"
                }
            },
            "annotations": [],
            "lastPublishTime": "2023-05-02T07:27:09Z"
        },
        "type": "Microsoft.DataFactory/factories/pipelines"
    }
    

    Child Pipeline JSON:

    {
        "name": "child",
        "properties": {
            "activities": [
                {
                    "name": "ForEach1",
                    "type": "ForEach",
                    "dependsOn": [],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@pipeline().parameters.date_folder",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "Delete1",
                                "type": "Delete",
                                "dependsOn": [],
                                "policy": {
                                    "timeout": "0.12:00:00",
                                    "retry": 0,
                                    "retryIntervalInSeconds": 30,
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "userProperties": [],
                                "typeProperties": {
                                    "dataset": {
                                        "referenceName": "sourcecsv",
                                        "type": "DatasetReference",
                                        "parameters": {
                                            "folderpath": {
                                                "value": "@concat(pipeline().parameters.path,'/',item().name)",
                                                "type": "Expression"
                                            }
                                        }
                                    },
                                    "enableLogging": false,
                                    "storeSettings": {
                                        "type": "AzureBlobFSReadSettings",
                                        "recursive": true,
                                        "enablePartitionDiscovery": false
                                    }
                                }
                            }
                        ]
                    }
                }
            ],
            "parameters": {
                "date_folder": {
                    "type": "array"
                },
                "path": {
                    "type": "string"
                }
            },
            "annotations": []
        }
    }
    

    Folders before pipeline execution:

    enter image description here

    You can see the folders which are more than 7 days folders were deleted after pipeline execution.

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search