When trying to use SERIALIZABLE
transactions in PostgreSQL, I’m hitting very unexpected serialization failures on the simplest possible access patterns: SELECT
row to see if it exists, and then INSERT
when it doesn’t.
To set the background:
--psql (14.10 (Ubuntu 14.10-0ubuntu0.22.04.1), server 16.1 (Debian 16.1-1.pgdg120+1))
create table kv(
key TEXT NOT NULL PRIMARY KEY,
value TEXT);
Now, we run the following transactions in two terminals:
Terminal 1:
begin isolation level serializable read write;
SET enable_seqscan=off;
select value from kv where key = 'a';--(0 rows)
insert into kv(key, value) values ('a', 'vala');--INSERT 0 1
commit;--COMMIT
Terminal 2:
begin isolation level serializable read write;
SET enable_seqscan=off;
select value from kv where key = 'b';--(0 rows)
insert into kv(key, value) values ('b', 'valb');--INSERT 0 1
commit;
ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt. HINT: The transaction might succeed if retried.
The above reproduces for me 100% consistently, as long as the order of operations is:
- begin transaction in 1st term
- begin transaction in 2nd term
- select in 1st
- select in 2nd
- insert in 1st
- insert in 2nd
- commit in 1st
- commit in 2nd
After step 6), before commits, the pg_locks looks like this
select nspname, relname, l.*
from pg_locks l
join pg_class c on (relation = c.oid)
join pg_namespace nsp on (c.relnamespace = nsp.oid)
where pid in (select pid
from pg_stat_activity
where datname = current_database()
and query != current_query());
nspname | relname | locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
public | kv_pkey | relation | 5 | 24824 | 5/13 | 37 | AccessShareLock | t | t | ||||||||
public | kv | relation | 5 | 24819 | 5/13 | 37 | AccessShareLock | t | t | ||||||||
public | kv | relation | 5 | 24819 | 5/13 | 37 | RowExclusiveLock | t | t | ||||||||
public | kv_pkey | relation | 5 | 24824 | 3/14 | 32 | AccessShareLock | t | t | ||||||||
public | kv | relation | 5 | 24819 | 3/14 | 32 | AccessShareLock | t | t | ||||||||
public | kv | relation | 5 | 24819 | 3/14 | 32 | RowExclusiveLock | t | t | ||||||||
public | kv_pkey | relation | 5 | 24824 | 3/14 | 32 | SIReadLock | t | f | ||||||||
public | kv_pkey | relation | 5 | 24824 | 5/13 | 37 | SIReadLock | t | f |
Interestingly, if instead of interleaving selects and inserts in different transactions, I do select and insert in first transaction, and then select and insert in the other, both of them successfully commit. The pre-commit locks then look as follows:
nspname | relname | locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
public | kv_pkey | relation | 5 | 24824 | 5/15 | 37 | AccessShareLock | t | t | ||||||||
public | kv | relation | 5 | 24819 | 5/15 | 37 | AccessShareLock | t | t | ||||||||
public | kv | relation | 5 | 24819 | 5/15 | 37 | RowExclusiveLock | t | t | ||||||||
public | kv_pkey | relation | 5 | 24824 | 3/18 | 32 | AccessShareLock | t | t | ||||||||
public | kv | relation | 5 | 24819 | 3/18 | 32 | AccessShareLock | t | t | ||||||||
public | kv | relation | 5 | 24819 | 3/18 | 32 | RowExclusiveLock | t | t | ||||||||
public | kv_pkey | page | 5 | 24824 | 1 | 5/15 | 37 | SIReadLock | t | f | |||||||
public | kv_pkey | relation | 5 | 24824 | 3/18 | 32 | SIReadLock | t | f |
Note how instead of two relation SIReadLocks, we have one relation and one page lock.
I already bumped the max_pred_locks_per_*
:
show all;
– | – |
---|---|
max_pred_locks_per_page | 100 |
max_pred_locks_per_relation | 100 |
max_pred_locks_per_transaction | 100 |
Questions:
- Just as a sanity check: logically speaking, these two transactions are obviously serializable, and this is just Postgres being unable to prove it to itself, right?
- Why is Postgres failing to see it? Why does it take two relation locks on the private key index?
- Why does select-select-insert-insert order fail, but select-insert-select-insert order work?
- What can I do to make such an obvious and simple access pattern work? I’m clearly touching non-overlapping sets of keys in both transaction, so there should be a practical way to make it work, I hope.
2
Answers
Yes. An important observation here is that it’s reasonable to assume the 2nd step each client took could be dictated by the outcome of their 1st query because they both read the same, whole (empty) thing and then they tried to write to it. In your second example it’s possible to determine these clients aren’t on colliding trajectories because there’s a record already in place and that
where
condition clearly dodges it.There’s nothing in the table, nothing in the index, so it’s assumed both agents target the whole relation the same way they would if there was no
where
.In the second case there’s something in the table to work with (or rather around).
The documentation underlines two primary methods: be as brief and as specific as possible, and prepare to retry. You’re clearly applying both and your second case shows it’s already starting to pay off.
Right.
The interesting locks are the predicate locks (
SIReadLock
). They are taken on the index page that would have contained the row that was not found by theSELECT
. Since the table is small, botha
andb
would end up on the same index page (the root page). If the index is empty, the whole index will be locked.Since both values belong on the same index page, you get a false positive serialization error.
If you enter some more rows and use values that go on different index pages, there would be no serialization error.
No idea why the second execution order does not fail. I would have expected it to fail, but am too lazy to research.
It works just fine. You have to expect serialization errors and retry transactions. Retry the failing transaction, and it will succeed. That’s how
SERIALIZABLE
works.