I would like to update values in one postgres table from values in another postgres table, but I would like to update only values which NEED to be updated – ie if a value is already correct, leave it alone (and dont overwrite it with the same value).
Its important I dont just overwrite with the same value because I want to be able to know if the values are already in sync, ie when I run the command it should say how many rows need to be updated. One further complication – in the values, there could be NULL values.
If I add a couple of WHERE clauses to the regular UPDATE command, I get unexpected behaviour:
In its simplest form, lets say I have 2 postgres tables:
create table table1
(
id int,
key int,
val varchar
);
create table table2
(
key int,
val varchar
);
insert into table1 (id, key, val) values (1, null, null);
insert into table1 (id, key, val) values (2, 2, null);
insert into table1 (id, key, val) values (3, 3, 'z');
insert into table2 (key, val) values (1, 'x');
insert into table2 (key, val) values (2, 'y');
insert into table2 (key, val) values (3, 'z');
I know I can update table1 with values from table 2 with the query:
update table1 t1
set val = t2.val
from table2 t2
where t1.key = t2.key
However, I would like to update only values which NEED to be updated, ie if the existing value is correct, I don’t want to overwrite it.
Adding an AND clause: and t1.val != t2.val
works for places where t1.val isn’t null, but doesn’t update null values as null doesn’t fulfill the != clause.
When I try to add another clause to handle the null cases, I get unexpected results:
db=> select * from table1 order by id;
id | key | val
----+-----+-----
1 | |
2 | 2 |
3 | 3 | z
(3 rows)
db=> select * from table2 order by key;
key | val
-----+-----
1 | x
2 | y
3 | z
(3 rows)
db=> update table1 t1
set val = t2.val
from table2 t2
where t1.key = t2.key
and t1.val != t2.val
OR t1.val is NULL AND t1.key IS NOT NULL;
UPDATE 1
db=> select * from table1 order by id;
id | key | val
----+-----+-----
1 | |
2 | 2 | x
3 | 3 | z
(3 rows)
For some reason the null value is getting updated to ‘x’ (the val2 of val = 1, instead of ‘y’).
And then if I run the same update again (without changing anything):
db=> update table1 t1
set val = t2.val
from table2 t2
where t1.key = t2.key
and t1.val != t2.val
OR t1.val is NULL AND t1.key IS NOT NULL;
UPDATE 1
db=> select * from table1 order by id;
id | key | val
----+-----+-----
1 | |
2 | 2 | y
3 | 3 | z
(3 rows)
The erroneous ‘x’ value gets updated to the correct value (‘y’).
What is happening here??!
Note that I know I can do a more complicated joined update query:
db=> select * from table1 order by id;
id | key | val
----+-----+-----
1 | |
2 | 2 |
3 | 3 | z
(3 rows)
db=> with joined as (
select t1.id, t1.key, t1.val, t2.key as t2key, t2.val as t2_val
from table1 t1
join table2 t2
on t2.key = t1.key
)
update table1 t1
set val = joined.t2_val
from joined
where t1.key = joined.t2key
and t1.val != joined.t2_val
OR t1.val is NULL AND t1.key IS NOT NULL;
UPDATE 1
db=> select * from table1 order by id;
id | key | val
----+-----+-----
1 | |
2 | 2 | y
3 | 3 | z
(3 rows)
And immediately get the correct result with the right number of updated rows, but why doesn’t the simpler join work? What is happening??
2
Answers
you can use
is distinct from
, which handles nullsthis will work