How to update 10M records in Postgresql

JiKra
November 9, 2023
300 views
0 votes
2 Answers

I have two tables – accounts, inf_accounts.
There is a daily update on millions of rows from inf_accounts to accounts.

I tried two updates:

UPDATE accounts acc SET account_status=(SELECT account_status FROM inf_accounts inf WHERE acc.account_number = inf.account_number);

This one performs kind of OK – 10M rows update in 5-7 mins.
Problem is, it doesn’t work as expected. When there are rows in accounts table not to be found in inf_accounts, it obviously crashes.

UPDATE accounts acc SET account_status=inf.account_status FROM inf_accounts inf WHERE acc.account_number = inf.account_number;

This one works ok but performs horribly – 10M rows update in dozens of minutes.

Is there a way to do it better?

Tags: postgresql

Answers

- OJones
- November 9, 2023 at 7:11 pm
- 0 votes
0
In your second example statement try appending
```
AND acc.account_status <> inf.account_status
```
to the WHERE clause. This will make the DBMS refrain from updating rows where the status is already set correctly. If that is most of the rows it will save a ton of time.
Login or Signup to reply.

- richyen
- November 9, 2023 at 7:26 pm
- 0 votes
0
One way to approach this is by looking at the data set to be accessed when the UPDATE is performed. You can do this by looking at the results of SELECT on the same parameters, and pass through EXPLAIN:
```
EXPLAIN SELECT *
FROM accounts acc,
     inf_accounts inf
WHERE acc.account_number = inf.account_number;
```
This will likely reveal whether an index is being used to compare acc.account_number and inf.account_number. If not, then you will likely be taking a long time to join the two tables to determine which rows of accounts to update.

Additionally, as O. Jones mentioned in the other answer, there is no limiting of which rows in accounts to update. Without AND acc.account_status <> inf.account_status, you will set acc.account_status even if it is already equal to inf.account_status, resulting in an unnecessary write, and thereby taking up extra time.

If you haven’t already, be sure to index both the account_status and account_number columns on both tables. EXPLAIN may help you identify the actual bottleneck in the UPDATE call (note that EXPLAIN will not run the query, but only show the execution plan. If you want to see the actual runtime performance, use EXPLAIN ANALYZE instead)
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.