skip to Main Content

I am loading CSV files to Redshift. I first run the COPY command to a staging table, then normalize the account data into a separate lookup table.

BEGIN;
    INSERT INTO AccountLookup (Account)
    SELECT DISTINCT Account FROM Staging db
    WHERE NOT EXISTS (SELECT 1 FROM AccountLookup acct WHERE acct.Account = db.Account);
END;

I expect the above query to not insert any duplicate rows into the AccountLookup table. It works most of the time. But I rarely do find some duplicate entries which is causing some problems.

Next, I am using the following query to transfer the data from staging table to main table. Here I perform join on the lookup table to get the account id. Since the lookup table has duplicate entries, the main table also gets duplicate entries which it shouldn’t have.

BEGIN;
    INSERT INTO Main (Usage, AccountId, Cost)
    SELECT bd.Usage, acct.Id, bd.Cost FROM Staging bd
    LEFT JOIN AccountLookup acct ON bd.Account = acct.Account;
END;

I can eliminate duplicates in the Main table by using the row_number() window function. But I would like to avoid inserting duplicate entries in the lookup table. Any help would be appreciated.

2

Answers


  1. Try this:

    BEGIN;
        INSERT INTO Main (Usage, AccountId, Cost)
        SELECT bd.Usage, acct.Id, bd.Cost FROM Staging bd
        LEFT JOIN AccountLookup acct ON bd.Account = acct.Account
        GROUP BY bd.Usage, acct.Id, bd.Cost;
    END;
    
    Login or Signup to reply.
  2. You should only INSERT DISTINCT rows, that are not in the main TABLE, which you can ensure with EXISTS

    You should alos do that for the first query

    BEGIN;
        INSERT INTO Main (Usage, AccountId, Cost)
        SELECT DISTINCT bd.Usage, acct.Id, bd.Cost FROM Staging bd
        LEFT JOIN AccountLookup acct ON bd.Account = acct.Account
        WHERE NOT EXISTS (SELECT 1 FROM Main  
            WHERE main.Usage = bd.Usage AND main.AccountId = acct.Id 
             AND main.Cost =  bd.Cost);
    END;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search