This is my first time make a post in here and appreciate any suggestion first!
Basically, I got two tables that one is Incremental table and the other is Final table.
In the First day loading, Incremental table is like below
ID | Code |
---|---|
1 | A |
1 | B |
2 | C |
Therefore, Final table was empty firstly and then loaded the record from Incremental into Final table
ID | Code |
---|---|
1 | A |
1 | B |
2 | C |
However, in the Second day, the Incremental table got changed.
ID | Code |
---|---|
1 | D |
So that the Final table should be like this.
ID | Code |
---|---|
1 | D |
2 | C |
If there are same ID in both Incremental and Final, the Incremental should replace all the same ID, such as ID = 1 got replaced with Incremental table.
If the ID only in the Final table but not in the Incremental table, it would keep the same row, such as ID = 2
And my goal is to make a query to find out the removed record
ID | Code |
---|---|
1 | A |
1 | B |
(Each row can be view as unique row in both Incremental and Final table. And everyday each ID its Code could be changed, so that I need to update with the latest Code.)
I’m struggling with this issue for five days long and I can make a query for finding out Second day removed record, but it would have some issue on First day loading.(I’m writing this with PostgreSQL)
Any suggestion is appreciated !
2
Answers
You can create a function or T-SQL procedure which can have below 2 queries.
Delete * from FINAL where ID in ( select distinct ID from INCREMENTAL);
Insert into FINAL (Select Id, Code from INCREMENTAL;
These are just rough queries bu should give you basic idea
Edit :
To get the records which are going to get removed, you can add one more sql statement before Delete statement as
Insert into SOME_OTHER_TABLE (Select * from FINAL where ID in ( select distinct ID from INCREMENTAL));
This answer takes advantage of the OUTPUT clause available in SQL Server. However, this is not necessary just seems a bit cleaner.
I would prefer to see some notion of a primary key on this data, but I’ll assume you have a good reason for what you are doing. However, this answer is predicated on the assumption that the ID column can NOT be NULL.
Create relevant tables, and load inital data.
Next we define a procedure to move data from Staging to Persisted in the manner you’ve described.
Finally, use this procedure to load the data from Staging to Persisted. Then reload Staging with some new data, and run the procedure again.
Play around with the inputs yourself to confirm if this does what you want it to. Try it yourself here: db<>fiddle