Azure - How to get no of rows available in adf for a static file in ADLS gen2 of .csv or .parquet

SmrutiranjanSenapati
January 13, 2025
83 views
0 votes
2 Answers

One pipeline loads data from an API to ADLS gen2. I have fixed the file’s name so that it will replace the file in every run of the pipeline.
I am looking to get the number of rows available in the file to use that value in my next activity for some calculations. I want to receive that value as activity output for the next activity as an input parameter.

Please help me here how can I get it. Please suggest to me the process in the case of both .csv & .parquet files and which is convenient to implement

Answers

- Nandan
- January 13, 2025 at 8:35 am
- 0 votes
0
Assuming your file size is low or has less rows, you can use Lookup activity to get the count of rows within the file.
Lookup activity limitation : The Lookup activity has a maximum of 5,000 rows, and a maximum size of 4 MB.

In case if file is greater, than you can copy the file data into a SQL database and the copy activity log output would provide row counts.

In case if you cant use SQL, then you would have to use custom logics in databricks, Azure batch or Azure functions etc to determine the count

Login or Signup to reply.

- PratikLad
- January 13, 2025 at 9:54 am
- 0 votes
0
As you want to get the number of rows in the available file as lookup activity has limitation of 5000 rows and 4 mb to resolve this you can use data flow activity as below:
- First take data flow activity and add source dataset as the file where you copied data from API.
- Then take SurrogateKey transformation to add the rownum column with start and step value as 1.
- Then take Derived column transformation to add one constant column with value 1.
- Then take Aggregate transformation to get count of rows add above constant column in group by and create new column as count and give expression as count(rownum) in aggregates.
- After this add sink as cache and check the Write to activity output option this will get you row count in dataflow output.
  
  Select logging as none in Dataflow setting.
The data flow output will look like below:

Then you can access the value using @activity('Data flow2').output.runStatus.output.sink1.value[0].count this expression in next activity.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Azure – How to get no of rows available in adf for a static file in ADLS gen2 of .csv or .parquet

Answers