Postgresql - How to resolve this Incremental loading with SQL?

AliceHo
March 30, 2024
220 views
0 votes
2 Answers

This is my first time make a post in here and appreciate any suggestion first!

Basically, I got two tables that one is Incremental table and the other is Final table.

In the First day loading, Incremental table is like below

ID	Code
1	A
1	B
2	C

Therefore, Final table was empty firstly and then loaded the record from Incremental into Final table

ID	Code
1	A
1	B
2	C

However, in the Second day, the Incremental table got changed.

ID	Code
1	D

So that the Final table should be like this.

ID	Code
1	D
2	C

If there are same ID in both Incremental and Final, the Incremental should replace all the same ID, such as ID = 1 got replaced with Incremental table.
If the ID only in the Final table but not in the Incremental table, it would keep the same row, such as ID = 2

And my goal is to make a query to find out the removed record

ID	Code
1	A
1	B

(Each row can be view as unique row in both Incremental and Final table. And everyday each ID its Code could be changed, so that I need to update with the latest Code.)

I’m struggling with this issue for five days long and I can make a query for finding out Second day removed record, but it would have some issue on First day loading.(I’m writing this with PostgreSQL)

Any suggestion is appreciated !

Tags: postgresql sql

Answers

- ShashankKawle
- March 29, 2024 at 8:58 pm
- 0 votes
0
You can create a function or T-SQL procedure which can have below 2 queries.

Delete * from FINAL where ID in ( select distinct ID from INCREMENTAL);

Insert into FINAL (Select Id, Code from INCREMENTAL;

These are just rough queries bu should give you basic idea

Edit :
To get the records which are going to get removed, you can add one more sql statement before Delete statement as

Insert into SOME_OTHER_TABLE (Select * from FINAL where ID in ( select distinct ID from INCREMENTAL));

Login or Signup to reply.

This answer takes advantage of the OUTPUT clause available in SQL Server. However, this is not necessary just seems a bit cleaner.

I would prefer to see some notion of a primary key on this data, but I’ll assume you have a good reason for what you are doing. However, this answer is predicated on the assumption that the ID column can NOT be NULL.

Create relevant tables, and load inital data.

--simulate your incremental table ,I have called this Staging
DROP TABLE IF EXISTS Staging;
CREATE TABLE Staging (ID BIGINT NOT NULL
                        , Code CHAR(1) NOT NULL
                        );
                        
INSERT INTO Staging(ID
                    , Code
                    )
VALUES(1, 'A')
    , (1, 'B')
    , (2, 'C');
    
    
--simulate your next table (final table). I've called this Persisted
DROP TABLE IF EXISTS Persisted;
CREATE TABLE Persisted(ID BIGINT NOT NULL
                        , Code CHAR(1) NOT NULL
                        );
                                            
--Create a table to store the deletions
DROP TABLE IF EXISTS PersistedDeletes
CREATE TABLE PersistedDeletes(ID BIGINT NOT NULL
                            , Code CHAR(1) NOT NULL
                            , DeletedOn DATETIME2(7) NOT NULL
                            );

Next we define a procedure to move data from Staging to Persisted in the manner you’ve described.

CREATE PROCEDURE LoadStagingToPersisted
AS
BEGIN
    --Two Atomic DML operations are happening to Persisted. 
    --So let's wrap this in a transaction
    SET XACT_ABORT ON;
    BEGIN TRANSACTION 
        --Compare Staging with Persisted
        CREATE TABLE #Comparison(ID BIGINT NOT NULL
                            , InSourceNotInTarget BIT NOT NULL
                            , InTargetNotInSource BIT NOT NULL
                            , InBothSourceAndTarget BIT NOT NULL
                            );
    
        INSERT INTO #Comparison(ID
                            , InSourceNotInTarget
                            , InTargetNotInSource
                            , InBothSourceAndTarget
                            )
        --Assumption 1: ID is never NULL. so it can only be NULL via a left/right join
        SELECT COALESCE(src.ID, tgt.ID) AS ID
            , CASE 
                WHEN tgt.ID IS NULL 
                THEN 1
                ELSE 0
            END AS InSourceNotInTarget
            , CASE 
                WHEN src.ID IS NULL 
                THEN 1
                ELSE 0
            END AS InTargetNotInSource
            , CASE 
                WHEN src.ID IS NOT NULL AND tgt.ID IS NOT NULL 
                THEN 1
                ELSE 0
            END AS InBothSourceAndTarget
        FROM Persisted tgt
        FULL JOIN Staging src
            ON tgt.ID = src.ID

        --Delete from Persisted all IDs which match between Staging and Persisted.
        --Output the deleted rows into the table PersistedDeletes.
        DELETE per
        OUTPUT DELETED.ID
            , DELETED.Code
            , GETDATE()
        INTO PersistedDeletes
        FROM Persisted per
        INNER JOIN (
            SELECT ID 
            FROM #Comparison
            WHERE InBothSourceAndTarget = 1
            GROUP By ID
            ) del
            ON per.ID = del.ID
        ;


        --Now we insert all the new records, as well as those that replace the previous deletes.
        INSERT INTO Persisted(ID
                            , Code
                            )
        SELECT src.ID
            , src.Code
        FROM Staging src
        INNER JOIN (
            SELECT ID
            FROM #Comparison
            WHERE InSourceNotInTarget = 1
                OR InBothSourceAndTarget = 1
            GROUP BY ID
            ) ins
            ON src.ID = ins.ID;
    COMMIT
END;

Finally, use this procedure to load the data from Staging to Persisted. Then reload Staging with some new data, and run the procedure again.

--Load Persisted
EXEC LoadStagingToPersisted;
SELECT * FROM Persisted;

--simulate the next state of Staging
TRUNCATE TABLE Staging;
INSERT INTO Staging(ID
                , Code
                )
VALUES (1, 'D');

--Load Persisted again
EXEC LoadStagingToPersisted;
SELECT * FROM Persisted;
--Also check what was deleted
SELECT * FROM PersistedDeletes;

Play around with the inputs yourself to confirm if this does what you want it to. Try it yourself here: db<>fiddle

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – How to resolve this Incremental loading with SQL?

Answers