skip to Main Content

I have an excel file as source that needs to be copied into the Azure SQL database using Azure Data Factory.

The ADF pipeline needs to copy the rows from the excel source to SQL database only if it is already not existing in the database. If it exists in the SQL database then no action needs to be taken.

looking forward to the best optimized solution.

2

Answers


  1. You can achieve it using Azure data factory data flow by joining source and sink data and filter the new insert rows to insert if the row does not exist in the sink database.

    Example:

    1. Connect excel source to source transformation in the data flow.

    enter image description here

    Source preview:

    enter image description here

    1. You can transform the source data if required using the derived column transformation. This is optional.

    2. Add another source transformation and connect it with the sink dataset (Azure SQL database). Here in the Source option, you can select a table if you are comparing all columns of the sink dataset with the source dataset, or you can select query and write the query to select only matching columns.

    enter image description here enter image description here

    Source2 output:

    enter image description here

    1. Join source1 and source2 transformations using the Join transformation with join type as Left outer join and add the Join conditions based on the requirement.

    enter image description here

    Join output:

    enter image description here

    1. Using filter transformation, filter out the existing rows from the join output.

    Filter condition: isNull(source2@Id)==true()

    enter image description here

    Filter output:

    enter image description here

    1. Using the Select transformation, you can remove the duplicate columns (like source2 columns) from the list. You can also do this in sink mapping by editing manually and deleting the duplicate rows.

    enter image description here

    1. Add sink and connect to sink dataset (azure SQL database) to get the required output.

    enter image description here

    Login or Signup to reply.
  2. You should create this using a Copy activity and a stored procedure as the Sink. Write code in the stored proc (eg MERGE or INSERT ... WHERE NOT EXISTS ...) to handle the record existing or not existing.

    enter image description here

    An example of a MERGE proc from the documentation:

    CREATE PROCEDURE usp_OverwriteMarketing
        @Marketing [dbo].[MarketingType] READONLY, 
        @category varchar(256)
    AS
    BEGIN
        MERGE [dbo].[Marketing] AS target
        USING @Marketing AS source
        ON (target.ProfileID = source.ProfileID and target.Category = @category)
        WHEN MATCHED THEN
            UPDATE SET State = source.State
        WHEN NOT MATCHED THEN
            INSERT (ProfileID, State, Category)
            VALUES (source.ProfileID, source.State, source.Category);
    END
    

    This article runs through the process in more detail.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search