I have a pipeline running in my ADF where I take fields from a json file and save them to multiple tables in my database. The pipeline starts running when a file is uploaded to a container in my storage account, and after the run the files are deleted.
The problem at hand is that when the pipeline fails for whatever reason, the files don’t get deleted (because it doesn’t make it to the end of the run), and for some reason the SQL records are still saved to some tables which means my tables get spammed with data. Is there any way to only upload the data if the pipeline succeeded?
I didn’t see any option that allows me to do that and have to manually take care of the problem when it happens
2
Answers
Usually when we trigger one activity from another, we use the success route. You can also use a catch block in case of an activity failure to do some other activity. Refer to this documentation.
So incase the failures happen in a certain activity, use this method to execute another activity which will clear out your SQL records.
Adding to @Anupam Chand’s answer,
There is no direct way to roll-back or delete records from SQL table which got inserted in the current pipeline run. You should have some flag or watermark column field in the SQL table to identify the records which got inserted in current run. During failure of the pipeline activities, you can add the logic of deleting records in the failure path. I tried to repro this. Below is the approach.
By this way, we can roll-back the records which got partially loaded during pipeline failure.