Wildcard Paths in Azure ADF Data Flow

JasonB
March 26, 2024
105 views
0 votes
2 Answers

I have a dataset setup in my ADF solution which points to a blob container. There are 25 CSV files sitting in this container.

I have a dataflow setup in ADF which currently points to each .CSV file respectfully. For example, this csv is called Campus.csv however this file can come into the container as Campus20-02-24.csv – the date portion will always change.

So within the ADF Dataflow which looks like this below.

ADF Data flow

Now in the source connection, it points to the container and no specific CSV in that container, however I want it to connect to the Campus.csv file named like Campus20-02-24.csv so I am trying to understand the wild card path syntax to point to this source file i.e. location here below

Source Option below Wild Card Path

I have tried using /*/Campus20????????.csv where the Campus file will always have Campus20 + 8 date characters.csv.

This is the error I am getting when attempting to data preview the csv

Error when data previewing the results

What syntax should i be using here please?

I have tried using a combination of the Container – blob – csv file alike wildcard name.

I have attempted something like this to no success

Tags: azure azure-data-factory

Answers

- AnnuKumari
- March 21, 2024 at 7:46 am
- 0 votes
0
Instead of backslash *Campus20????????.csv , please try forward slash /*/Campus20????????.csv . In the query you mentioned that you tried /*/Campus20????????.csv , however in screenshot it’s different.

Login or Signup to reply.

- RakeshGovindula
- March 21, 2024 at 9:07 am
- 0 votes
0
In case if you want to merge all the source files and transform them into target file, you can go ahead with @AnnuKumari’s answer.
But you have mentioned that your files pattern is like Campus20-02-24.csv where there are only 6 characters after 20. If 24 represents year, change the pattern like this */Campus??????24.csv.

I have a dataflow setup in ADF which currently points to each .CSV file respectfully

In the case of transforming each csv file into target files in each iteration, you can try the below approach.
- First use a new dataflow to get the list of filepaths. Give the same wild card path */Campus??????24.csv in the source and set a column for the filepath like below. This will generate a new column and stores the filepaths of each csv file in its rows.
- Next use aggregate transformation and do group by on column filepath and create a sample column count on any column.
- Use select and remove the extra count column and add the sink after this.
- In the sink, select cache and check on Write to activity output. The dataflow output will give the required filepaths list of nested folders.
Use this expression in the pipeline to get the filepaths array from dataflow output.
```
@activity('Data flow1').output.runStatus.output.<sink_name>.value
```
Go through this SO answer for the detail step by step procedure of the above.

Now, use this array and loop through the files and process each csv file as per your requirement in each iteration. Give the above expression to a for-each activity.

Inside For-each activity, give your dataflow activity. In your dataflow, go to your source dataset and create a string type parameter and give that to the filename as @dataset().filepath like below.

In the dataflow activity, it will ask the value of this parameter. Now, pass each filename using @item().filepath inside for-loop like below.

The first dataflow will generate the list of filepaths and it will be passed to the for-each activity. Each csv file will be processed by your dataflow by this for-loop.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.