skip to Main Content

I run the below query concurrently in pgAdmin to see how SELECT FOR UPDATE run.

The logic is that there will be a limit for number of users for each team. And if the number reach the limit, the insert will be abort.

The script is as below. pg_sleep is to give me time execute the script 2 before script 1 finish.

DO $$
declare
    team_user_count INT;

BEGIN
    RAISE NOTICE 'Insert start at %', timeofday();
    
    team_user_count := COUNT(user_id) FROM (select user_id from mst_user_team where team_id = 1 FOR UPDATE ) mut;

    if team_user_count >= 30 then
        RAISE NOTICE 'Not accept insert. team_user_count=%', team_user_count;
    else
        RAISE NOTICE 'Accept insert. team_user_count=%', team_user_count;    
        PERFORM pg_sleep(10);    
        -- insert script here
    end if;

    RAISE NOTICE 'Insert done at %', timeofday();

END$$

At the start, the number of user will be 29. 2 scripts run concurrently and both successfully insert the data. The notice both indicated that the current number of users is 29. Why is this happen?

My thought is that when script 2 will wait until script 1 end, then execute the select for update query to get the number 30. But, apparently, it already get the data, and just wait until script 1 finish to show the result.

Is my understand correct?

3

Answers


  1. Yes, your understanding is correct. The behavior you are observing is due to the way PostgreSQL handles FOR UPDATE in concurrent transactions.

    When a transaction issues a SELECT … FOR UPDATE statement, it acquires a lock on the selected rows. Other transactions trying to acquire a conflicting lock (in this case, for update on the same rows) will be forced to wait until the first transaction either commits or rolls back.

    both transactions can acquire the lock on the rows initially, but the actual modifications and updates to the data are serialized by the lock, ensuring that only one transaction can make changes at a time.

    Login or Signup to reply.
  2. The FOR UPDATE lock in PostgreSQL stops conflicting locks on particular rows, but it does not stop other operations from reading the same data. In your scenario, before obtaining the update lock during the subsequent INSERT process, both scripts are able to view the user count simultaneously. If necessary, think about utilizing explicit advisory locks or other synchronization techniques to facilitate communication between the scripts.
    Hope it’s helpful 🙂

    Login or Signup to reply.
  3. Like all SQL statements, SELECT ... FOR UPDATE acquires a snapshot when it starts, that is, it determines which transactions are visible and which aren’t. Any rows that you INSERT later on aren’t visible to the statement, even if they were inserted before the SELECT completed.

    There is only one exception where a statement can see data that were added after its snapshot was taken: If a statement is blocked by a lock, it waits for the lock to be relinquished, and then it fetches the latest committed version of the row and checks again if that matches the query condition. This applies only to rows that were updated, but not rows that were inserted, which leads to interesting anomalies.

    Your case is a variant of that: one of the concurrent transactions (let’s call it transaction 1) grabs the lock on the rows, while the other (transaction 2) is blocked. Transaction 1 then inserts rows and commits. Transaction 2 fetches the latest versions of the locked rows and sees that they didn’t change, but it does not see the new rows that were inserted by transaction 1. So the test is passed, and transaction 2 also inserts some rows, violating the desired constraint.

    There is no way to guarantee a constraint like yours, unless you either use the SERIALIZABLE transaction isolation level or serialize the transactions (for example with an advisory lock).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search