skip to Main Content

I would like to delete the bottom two rows of an excel file in ADF, but I don’t know how to do it.
The flow I am thinking of is this.
enter image description here
*I intend to filter -> delete the rows to be deleted in yellow.

The file has over 40,000 rows of data and is updated once a month. (The number of rows changes with each update, so the condition must be specified with a function.)

The contents of the file are also shown here.
The bottom two lines contain spaces and asterisks.
enter image description here
Any help would be appreciated.
I’m new to Azure and having trouble.
I need your help.

2

Answers


  1. Add a surrogate key transformation to put a row number on each row. Add a new branch to duplicate the stream and in that new branch, add an aggregate.

    Use the aggregate transformation to find the max() value of the surrogate key counter.

    Then subtract 2 from that max number and filter for just the rows up to that max-2.

    Login or Signup to reply.
  2. Let me provide a more detailed answer here … I think I can get it in here without writing a separate blog.

    The simplest way to filter out the final 2 rows is a pattern depicted in the screenshot here. Instead of the new branch, I just created 2 sources both pointing to the same data source. The 2nd stream is there just to get a row count and store it in a cached sink. For the aggregation expression I used this: "count(1)" as the row count aggregator.

    In the first stream, that is the primary data processing stream, I add a Surrogate Key transformation so that I can have a row number for each row. I called my key column "sk".

    Finally, set the Filter transformation to only allow rows with a row number <= the max row count from the cached sink minus 2.

    The Filter expression looks like this: sk <= cachedSink#output().rowcount-2

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search