Inside a Azure synapse pipeline, I need to find and compare the row count of the CSV file with summary file stored in Azure Blob storage.
If the comparison fails then pipeline will stop, if the counts match then can proceed ahead in the pipeline.
pipeline, which components
2
Answers
Given my CSV is more than 10 Mill rows and in Synapse, I also found an alternative: Azure databricks notebook call from azure data factory based on if/else flag
If your rows are less than 5000, you can use lookup activities for this requirement which will give the required row count of the files.
Create datasets for your files and give those to two lookup activities like below. Make sure you uncheck the First row only option in both activities.
Now, take an if activity and use the below expression to compare both row counts.
If the condition satisfies, you can add the further activities in the True activities of if activity. Otherwise, you can stop the pipeline by not giving any activity in the False activities of if activity.
If the row count in your control file is like below sample.
Then, compare the count property from the control file lookup activity output array with the CSV file row count.
And, if the CSV file count is more than 5000, you can use ADF dataflow for this.
Get the row count of CSV file using below transformations in the Dataflow.
rowcount
and give some value.count(rowcount)
After dataflow activity, compare the row count of the CSV file with your control file count using if activity.