skip to Main Content

I would like to update values in one postgres table from values in another postgres table, but I would like to update only values which NEED to be updated – ie if a value is already correct, leave it alone (and dont overwrite it with the same value).

Its important I dont just overwrite with the same value because I want to be able to know if the values are already in sync, ie when I run the command it should say how many rows need to be updated. One further complication – in the values, there could be NULL values.

If I add a couple of WHERE clauses to the regular UPDATE command, I get unexpected behaviour:

In its simplest form, lets say I have 2 postgres tables:

create table table1
(
    id   int,
    key  int,
    val varchar
);

create table table2
(
    key  int,
    val varchar
);

insert into table1 (id, key, val) values (1, null, null);
insert into table1 (id, key, val) values (2, 2, null);
insert into table1 (id, key, val) values (3, 3, 'z');

insert into table2 (key, val) values (1, 'x');
insert into table2 (key, val) values (2, 'y');
insert into table2 (key, val) values (3, 'z');

I know I can update table1 with values from table 2 with the query:

update table1 t1
set val = t2.val
from table2 t2
where t1.key = t2.key

However, I would like to update only values which NEED to be updated, ie if the existing value is correct, I don’t want to overwrite it.

Adding an AND clause: and t1.val != t2.val

works for places where t1.val isn’t null, but doesn’t update null values as null doesn’t fulfill the != clause.

When I try to add another clause to handle the null cases, I get unexpected results:

db=> select * from table1 order by id;
 id | key | val 
----+-----+-----
  1 |     | 
  2 |   2 | 
  3 |   3 | z
(3 rows)


db=> select * from table2 order by key;
 key | val 
-----+-----
   1 | x
   2 | y
   3 | z
(3 rows)

db=> update table1 t1
set val = t2.val
from table2 t2
where t1.key = t2.key
and t1.val != t2.val
OR t1.val is NULL AND t1.key IS NOT NULL;
UPDATE 1

db=> select * from table1 order by id;
 id | key | val 
----+-----+-----
  1 |     | 
  2 |   2 | x
  3 |   3 | z
(3 rows)

For some reason the null value is getting updated to ‘x’ (the val2 of val = 1, instead of ‘y’).
And then if I run the same update again (without changing anything):

db=> update table1 t1
set val = t2.val
from table2 t2
where t1.key = t2.key
and t1.val != t2.val
OR t1.val is NULL AND t1.key IS NOT NULL;
UPDATE 1
db=> select * from table1 order by id;
 id | key | val 
----+-----+-----
  1 |     | 
  2 |   2 | y
  3 |   3 | z
(3 rows)

The erroneous ‘x’ value gets updated to the correct value (‘y’).

What is happening here??!

Note that I know I can do a more complicated joined update query:

db=> select * from table1 order by id;
 id | key | val 
----+-----+-----
  1 |     | 
  2 |   2 | 
  3 |   3 | z
(3 rows)

db=> with joined as (
    select t1.id, t1.key, t1.val, t2.key as t2key, t2.val as t2_val
    from table1 t1
    join table2 t2
    on t2.key = t1.key
)
update table1 t1
set val = joined.t2_val
    from joined
        where t1.key = joined.t2key
and t1.val != joined.t2_val
OR t1.val is NULL AND t1.key IS NOT NULL;
UPDATE 1
db=> select * from table1 order by id;
 id | key | val
----+-----+-----
  1 |     | 
  2 |   2 | y
  3 |   3 | z
(3 rows)

And immediately get the correct result with the right number of updated rows, but why doesn’t the simpler join work? What is happening??

2

Answers


  1. you can use is distinct from, which handles nulls

    update table1 t1
    set val = t2.val
    from table2 t2
    where t1.key = t2.key
    and t1.val is distinct from t2.val
    
    Login or Signup to reply.
  2. this will work

    UPDATE table1 t1
    SET val = joined.t2_val
    FROM joined
    WHERE t1.key = joined.t2key
    AND (t1.val != joined.t2_val OR t1.val IS NULL);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search