skip to Main Content

I’m working in a simple ETL for our budget on MS Azure Data Factory. This pipeline takes a source from an Excel and has a Sink on MS Azure SQL Database.

Already done the integration and it works, but i’m stuck in the validation part because this excel upload in a FTP Server will be manually, so human error is a possibility.

I’m looking for a proccess that could examine the columns "Date" and "Country" from my source (Excel), and evaluate if that already exists in my Azure SQL db. If that combination exists, wouldn’t upload the worksheet. If it doesn’t, upload it

I was thinking in something like this:

Already tried with the following activities: lookup, if condition.. but it’s getting complicated to manage null or void fields

This is the SQL Syntax on my mind

SELECT *
FROM Budget
WHERE MONTH(ExcelDate) = MONTH(GETDATE()) AND YEAR(ExcelDate) = YEAR(GETDATE()) AND Country = ‘AR’

But I have not been able to apply it successfully in an activity.

Example

2

Answers


  1. Chosen as BEST ANSWER

    I solved it with the following solution: "two look ups" + "two set variables" + "If condition"

    Basically, one lookup check the source, and the other one check the sink. After i set 2 variables to compare each other and include the "Copy data" activity inside the fail option in "If condition" with this function:

    @contains(variables('snk_date_arg'), variables('src_date_arg'))

    Thanks for the ideas.

    Pipeline example


  2. I’m looking for a proccess that could examine the columns "Date" and "Country" from my source (Excel), and evaluate if that already exists in my Azure SQL db.

    If all date and country rows in your excel file are same, then try the method suggested by @All About BI in comments.

    If you want to check whether all rows from excel file against the SQL table, then try the below approach. But note that this method only checks 5000 rows from excel and 5000 rows from SQL table.

    First take two lookup activities without checking the first row, one for the Excel file and another for the SQL table. Both activities will give the JSON arrays as output and the limitation for the lookup is 5000 rows.

    In the SQL lookup activity, select only the required columns using the query. My query for sample.

    SELECT Convert(varchar(20), date,120) as Date,Country from sample1;
    

    enter image description here

    Here, built the JSON array with required fields(Date and Country) from the first lookup output array. For that use a ForEach activity and give the lookup1 output array to it (Check the sequential).

    Inside the ForEach, take an append activity and build the json object like below.

    @json(concat('{"Date":"',item().Date,'","Country":"',item().Country,'"}'))
    

    enter image description here

    After ForEach, the JSON array will be created. If the intersection of this JSON array and SQL lookup output array had any elements, then you should not perform the copy activity and if not you should execute the copy activity.

    So, after ForEach, take a set variable activity of array type and find the intersection of the two arrays.

    @intersection(variables('excel_arr'),activity('Lookup2').output.value)
    

    enter image description here

    Then check the length of this array in an if activity and if the length is not greater than 0, take the copy activity inside the True activities of if and leave the False activities empty.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search