I have a postgres DB with a "hot" table, meaning it is receiving many updates + deletes every second. I would like to run a query that will UPDATE
about 10% of its rows.
I do not have an index on the 10% criteria and the isolation level is set to SERIALIZABLE
as a DB flag (affecting all transactions). I would like this query to succeed, even if it temporarily blocks other ongoing updates. This is a one-time data migration and I won’t need an index on this criteria once the transaction completes. The query will run for several minutes.
It seems I have a few paths forward, and I’m trying to determine which would be best:
- I could take out an explicit table lock, e.g.:
BEGIN;
LOCK TABLE hotness IN SHARE MODE;
SELECT * FROM hotness WHERE ...; -- 10% of rows affected
-- make some maybe-complex business logic decisions
UPDATE hotness SET x = some_result WHERE ...;
COMMIT;
But I wonder if that approach would block more concurrent manipulations than is strictly necessary.
- I could create a temporary index in the txn:
BEGIN;
CREATE INDEX foo ON hotness (...);
SELECT * FROM hotness WHERE ...; -- 10% of rows affected
-- make some maybe-complex business logic decisions
UPDATE hotness SET x = some_result WHERE ...;
DROP INDEX foo;
COMMIT;
This seemed appealing at first since the index would allow only certain updates to conflict (thanks to predicate locks), but it seems CREATE INDEX
itself takes out a SHARE
mode lock, so it seems to be strictly worse (table lock + overhead of btree creation).
- I could use explicit row-level locks, e.g.:
BEGIN;
SELECT * FROM hotness WHERE ... FOR NO KEY UPDATE; -- 10% of rows affected
-- make some maybe-complex business logic decisions
UPDATE hotness SET x = some_result WHERE ...;
COMMIT;
but without an index on this criteria, I think the planner would execute a SeqScan
, which would effectively row-lock the entire table, which also seems worse than option 1.
- I could rely on the
SERIALIZABLE
isolation level and retry again and again, hoping to catch a window where we happen to be the "winning" txn and don’t throw a serialization exception.
In my experiments, this doesn’t seem at all feasible, thanks to how many concurrent inserts + updates are happening in this table while my query is running.
Any thoughts on which approach is best?
2
Answers
There are various methods for updating 10% of the rows in "hot" PostgreSQL tables which undergo frequent updates and deletions. Although it can guarantee success, locking the table in SHARE MODE may momentarily block other transactions.
It takes a lot of resources and may result in significant locking to create a temporary index in the transaction. Without an index, table-level locks could result from row-level locks using FOR NO KEY UPDATE.
As serialization errors occur frequently, relying on the SERIALIZABLE isolation level with retries may not be practical.
In my opinion option 1 (a table-level lock) would be feasible given your restrictions, but it might obstruct other transactions. It is vital to fine-tune, taking into consideration the particular requirements and performance effects of your program.
As you noticed yourself, options 2. and 3. are not useful. Only 1. and 4. look promising.
If the table is really hot, I think that the only feasible option will be an explicit table lock.
SHARE
mode would be sufficient to prevent deadlocks, but there could still be serialization errors if concurrent transactions read the rows you want to modify. If you want to be certain that no conflicts can occur, you have to take anACCESS EXCLUSIVE
lock on the table. Both aSHARE
and anACCESS EXCLUSIVE
lock on the table are problematic, because they will cancel concurrent autovacuum runs. So only consider this solution if this update of 10% of the table does not occur often.To see if the less invasive method 4. could work, you’d have to experiment. It may well be that your transaction never succeeds on a hot table.