I have two tables – accounts, inf_accounts.
There is a daily update on millions of rows from inf_accounts to accounts.
I tried two updates:
UPDATE accounts acc SET account_status=(SELECT account_status FROM inf_accounts inf WHERE acc.account_number = inf.account_number);
This one performs kind of OK – 10M rows update in 5-7 mins.
Problem is, it doesn’t work as expected. When there are rows in accounts table not to be found in inf_accounts, it obviously crashes.
UPDATE accounts acc SET account_status=inf.account_status FROM inf_accounts inf WHERE acc.account_number = inf.account_number;
This one works ok but performs horribly – 10M rows update in dozens of minutes.
Is there a way to do it better?
2
Answers
In your second example statement try appending
to the WHERE clause. This will make the DBMS refrain from updating rows where the status is already set correctly. If that is most of the rows it will save a ton of time.
One way to approach this is by looking at the data set to be accessed when the
UPDATE
is performed. You can do this by looking at the results ofSELECT
on the same parameters, and pass throughEXPLAIN
:This will likely reveal whether an index is being used to compare
acc.account_number
andinf.account_number
. If not, then you will likely be taking a long time to join the two tables to determine which rows ofaccounts
to update.Additionally, as
O. Jones
mentioned in the other answer, there is no limiting of which rows inaccounts
to update. WithoutAND acc.account_status <> inf.account_status
, you will setacc.account_status
even if it is already equal toinf.account_status
, resulting in an unnecessary write, and thereby taking up extra time.If you haven’t already, be sure to index both the
account_status
andaccount_number
columns on both tables.EXPLAIN
may help you identify the actual bottleneck in theUPDATE
call (note thatEXPLAIN
will not run the query, but only show the execution plan. If you want to see the actual runtime performance, useEXPLAIN ANALYZE
instead)