skip to Main Content

This is my first time make a post in here and appreciate any suggestion first!

Basically, I got two tables that one is Incremental table and the other is Final table.

In the First day loading, Incremental table is like below

ID Code
1 A
1 B
2 C

Therefore, Final table was empty firstly and then loaded the record from Incremental into Final table

ID Code
1 A
1 B
2 C

However, in the Second day, the Incremental table got changed.

ID Code
1 D

So that the Final table should be like this.

ID Code
1 D
2 C

If there are same ID in both Incremental and Final, the Incremental should replace all the same ID, such as ID = 1 got replaced with Incremental table.
If the ID only in the Final table but not in the Incremental table, it would keep the same row, such as ID = 2

And my goal is to make a query to find out the removed record

ID Code
1 A
1 B

(Each row can be view as unique row in both Incremental and Final table. And everyday each ID its Code could be changed, so that I need to update with the latest Code.)

I’m struggling with this issue for five days long and I can make a query for finding out Second day removed record, but it would have some issue on First day loading.(I’m writing this with PostgreSQL)

Any suggestion is appreciated !

2

Answers


  1. You can create a function or T-SQL procedure which can have below 2 queries.

    Delete * from FINAL where ID in ( select distinct ID from INCREMENTAL);

    Insert into FINAL (Select Id, Code from INCREMENTAL;

    These are just rough queries bu should give you basic idea

    Edit :
    To get the records which are going to get removed, you can add one more sql statement before Delete statement as

    Insert into SOME_OTHER_TABLE (Select * from FINAL where ID in ( select distinct ID from INCREMENTAL));

    Login or Signup to reply.
  2. This answer takes advantage of the OUTPUT clause available in SQL Server. However, this is not necessary just seems a bit cleaner.

    I would prefer to see some notion of a primary key on this data, but I’ll assume you have a good reason for what you are doing. However, this answer is predicated on the assumption that the ID column can NOT be NULL.

    Create relevant tables, and load inital data.

    --simulate your incremental table ,I have called this Staging
    DROP TABLE IF EXISTS Staging;
    CREATE TABLE Staging (ID BIGINT NOT NULL
                            , Code CHAR(1) NOT NULL
                            );
                            
    INSERT INTO Staging(ID
                        , Code
                        )
    VALUES(1, 'A')
        , (1, 'B')
        , (2, 'C');
        
        
    --simulate your next table (final table). I've called this Persisted
    DROP TABLE IF EXISTS Persisted;
    CREATE TABLE Persisted(ID BIGINT NOT NULL
                            , Code CHAR(1) NOT NULL
                            );
                                                
    --Create a table to store the deletions
    DROP TABLE IF EXISTS PersistedDeletes
    CREATE TABLE PersistedDeletes(ID BIGINT NOT NULL
                                , Code CHAR(1) NOT NULL
                                , DeletedOn DATETIME2(7) NOT NULL
                                );
    

    Next we define a procedure to move data from Staging to Persisted in the manner you’ve described.

    CREATE PROCEDURE LoadStagingToPersisted
    AS
    BEGIN
        --Two Atomic DML operations are happening to Persisted. 
        --So let's wrap this in a transaction
        SET XACT_ABORT ON;
        BEGIN TRANSACTION 
            --Compare Staging with Persisted
            CREATE TABLE #Comparison(ID BIGINT NOT NULL
                                , InSourceNotInTarget BIT NOT NULL
                                , InTargetNotInSource BIT NOT NULL
                                , InBothSourceAndTarget BIT NOT NULL
                                );
        
            INSERT INTO #Comparison(ID
                                , InSourceNotInTarget
                                , InTargetNotInSource
                                , InBothSourceAndTarget
                                )
            --Assumption 1: ID is never NULL. so it can only be NULL via a left/right join
            SELECT COALESCE(src.ID, tgt.ID) AS ID
                , CASE 
                    WHEN tgt.ID IS NULL 
                    THEN 1
                    ELSE 0
                END AS InSourceNotInTarget
                , CASE 
                    WHEN src.ID IS NULL 
                    THEN 1
                    ELSE 0
                END AS InTargetNotInSource
                , CASE 
                    WHEN src.ID IS NOT NULL AND tgt.ID IS NOT NULL 
                    THEN 1
                    ELSE 0
                END AS InBothSourceAndTarget
            FROM Persisted tgt
            FULL JOIN Staging src
                ON tgt.ID = src.ID
    
            --Delete from Persisted all IDs which match between Staging and Persisted.
            --Output the deleted rows into the table PersistedDeletes.
            DELETE per
            OUTPUT DELETED.ID
                , DELETED.Code
                , GETDATE()
            INTO PersistedDeletes
            FROM Persisted per
            INNER JOIN (
                SELECT ID 
                FROM #Comparison
                WHERE InBothSourceAndTarget = 1
                GROUP By ID
                ) del
                ON per.ID = del.ID
            ;
    
    
            --Now we insert all the new records, as well as those that replace the previous deletes.
            INSERT INTO Persisted(ID
                                , Code
                                )
            SELECT src.ID
                , src.Code
            FROM Staging src
            INNER JOIN (
                SELECT ID
                FROM #Comparison
                WHERE InSourceNotInTarget = 1
                    OR InBothSourceAndTarget = 1
                GROUP BY ID
                ) ins
                ON src.ID = ins.ID;
        COMMIT
    END;
    

    Finally, use this procedure to load the data from Staging to Persisted. Then reload Staging with some new data, and run the procedure again.

    --Load Persisted
    EXEC LoadStagingToPersisted;
    SELECT * FROM Persisted;
    
    --simulate the next state of Staging
    TRUNCATE TABLE Staging;
    INSERT INTO Staging(ID
                    , Code
                    )
    VALUES (1, 'D');
    
    --Load Persisted again
    EXEC LoadStagingToPersisted;
    SELECT * FROM Persisted;
    --Also check what was deleted
    SELECT * FROM PersistedDeletes;
    

    Play around with the inputs yourself to confirm if this does what you want it to. Try it yourself here: db<>fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search