I am trying to UPDATE target table but in source table I have duplicate data. I am trying in Azure SQL Server database. All will be ok with HASHBYTES but if I remove HASHBYTES it will raise an error:
drop table if exists #source
drop table if exists #target
CREATE TABLE #target(id int IDENTITY(1,1) NOT NULL,
a int NOT NULL,
b int NOT NULL,
c varchar(20) NOT NULL,
d decimal(10,3) NULL,
PRIMARY KEY (id))
CREATE TABLE #source(a int NOT NULL,
b int NOT NULL,
c varchar(20) NOT NULL,
d decimal(10,3) NULL)
INSERT #target (a, b, c, d) VALUES (1, 1, 'TEXT', NULL)
INSERT #source (a, b, c, d) VALUES (2, 2, 'TEST1', NULL), (2, 2, 'TEST2', NULL)
go
select a, b, count(*) as NumberOfRows
from #source
group by a, b
having count(*)>1
MERGE #target t
USING #source s ON t.a = s.a AND t.b = s.b
WHEN NOT MATCHED BY TARGET THEN
INSERT (a, b, c, d) VALUES(s.a, s.b, s.c, s.d)
WHEN MATCHED and hashbytes('SHA2_512', CONCAT(t.c, t.d)) != hashbytes('SHA2_512', CONCAT(s.c, s.d))
THEN UPDATE
set t.c=s.c
, t.d=s.d
;
select * from #target
I expect to have an error when source contain duplicate data.
When you run code first time all will be good, but when you ran MERGE again with HASHBYTES you need to have an error.
With HASHBYTES function MERGE execute successful without error:
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
Do you know why I don’t get error with HASHBYTES function?
Thanks in advance.
2
Answers
The problem is that the first half of your
MERGE
statement is saying "match records based ona
andb
".source
has two records that have the same value fora
andb
, whiletarget
has no record matching, which means both records will be inserted intotarget
.The second half handles what happens when there is a matching record, and the second time you run the
MERGE
statement, it hits that half.MERGE
looks at the incoming row fromsource
, and finds two matching rows intarget
; without theHASHBYTES
function, it has no way of knowing which of thosetarget
rows to update, and therefore throws an error.When including
HASHBYTES
, it adds an extra criteria to the matching – now, it looks at those two rows intarget
, and determines that neither row satisfy the extra criteria, and so it simply doesn’t do any update at all. Note that in this specific caseHASHBYTES
is overkill – you could easily have saidWHEN MATCHED AND CONCAT(t.c, t.d) != CONCAT(s.c, s.d)
.All that said – I question whether you’re really solving your original problem. This approach will not dedupe
source
. You may want to consider cleaning upsource
before doing yourMERGE
.Your problem is that
CONCAT
withNULL
will give back also aNULL
Add some values to the column d and you get not matched
fiddle