skip to Main Content

I want to migrate my app to clusters so I have to
deal with mysql locking.
For this post, I have simple table with ‘id’ and ‘flag’ whether is
record updated or not.
There are 3 threads reading table of 5 records. What I want to achieve is
that each thread reads only record not being read by other thread.
(Note that threads are only for testing as in real life, there will be separate
process/apps to doing this).

What I’ve read about Mysql locking is that is should disable autoCommit
and SELECT record with LOCK.

Here is my example code for thread read:

public Long read(Connection c) throws Exception {
    c.setAutoCommit(false);
    c.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED);
    PreparedStatement sth = c.prepareStatement("SELECT * FROM test WHERE flag!='Y' LIMIT 1 FOR UPDATE SKIP LOCKED") ;
    ResultSet rst = sth.executeQuery() ;
    Long id = null ;
    if (rst.next()) {
        id = rst.getLong("id");
        sth = c.prepareStatement("UPDATE test SET FLAG='Y' WHERE id=" + id) ;
        sth.executeUpdate();
        System.out.println(getName() + ": Commited id " + id);
    }
    c.commit();
    sth.close();
    return id ;
}

When running test with 3 threads the output is:

LockThread#2: Commited id 1
LockThread#2: Commited id 2
LockThread#3: Commited id 1
LockThread#2: Commited id 3
LockThread#1: Commited id 1
LockThread#1: Commited id 4
LockThread#3: Commited id 4
LockThread#2: Commited id 4
LockThread#3: Commited id 5
LockThread#2: Commited id 5
LockThread#1: Commited id 5

So the same record is not read only by one thread.
Is there any other trick to make an SQL transaction in Java.
How can I lock record upon retrieval, so next thread will read next non-locked record.

2

Answers


  1. Fundamentally your idea just will not work. You’re using ‘it is locked’ as a flag to communicate between threads. The message you are sending with it, is "this one is taken. Skip it; find something that isn’t taken yet".

    Generally locks are not so well defined unless you ‘lock in’ (heh) which database engine you’re using, which transaction level you pick, and for some DB engines, which table engine is used. In other words, any attempt to use locks as a message bus like this requires a ton of comments stating that it flat out will just do bizarre things and not work properly unless you use just the right DB engine in the exact right configuration. That’s "fragile code" and pretty much never worth doing.

    More specifically, locks aren’t forever. What if one thread locks a row, processes it, and unlocks it because it’s done, and then the next thread grabs the first unlocked row and re-processes one that is already processed? Surely that’s not your intent.

    Perhaps you think this doesn’t matter: Your intent is that any thread that ‘grabs hold of a row’ acquires a lock on it and will not relinquish it for a very long time indeed. It’ll be processing that row this side of forever.

    But then this is also a very very bad idea: This is simply not what locks are for. As developers we make it a habit to at least consider using things for purposes they weren’t designed for, but you shouldn’t resort to such acts unless there’s a pressing reason to resort to hackery (which is generally what that term is taken to mean for developers: Using things in ways they were never designed for. Not malicious intent to do stuff that you aren’t authorized to do; it’s simply that pretty much by definition if you’re granting yourself access to places you shouldn’t, you’re hacking by definition. But the term is broader than that).

    There are good reasons for that. For example, in this case, DBs are not designed with the thought that locks are used as flags and therefore are held for excessively long swaths of time. It means by definition that the transaction is open for ages (once you shut down a transaction all locks it holds are auto-closed). Both ‘really long lived transactions’ and ‘really long lived locks’ tend to play havoc on DB performance. You’re far better off using an actual flag, or a counter, or creative solutions such as hash bucketing.

    An actual flag

    Instead of using locks as a flag to indicate a row is ‘taken’ use… an actual flag. Make a db column of type BOOLEAN (if your DB engine supports it. Enum, or if you have to, the smallest numeric value, otherwise). Use the usual transactional safety mechanisms (preferably, serializable transaction isolation level and retry mechanics. Proper DB abstractions such as JDBI and JOOQ make this easy), so that any given thread attempts to set the flag from false to true, and such a thread continues only if it has ascertained that it was the one that actually performed the modification. This thread is now free to relinquish the transaction, there is no further lock, and this thread can return without that inherently implying the row is ‘open for processing again’ – because that boolean column is still true. If you want that row to be re-available for other threads to work on, set it to false – make it explicit.

    Via separate table

    Perhaps you are not in the position to modify the table. No problem – tables always have a unique identifier (and if somehow they don’t, there’s the DB engine’s unique identifier, such as OID in postgres as an example, there’s always a way) – so make a new table with just 2 columns: That identifier, and that boolean flag. You can even DROP the table once processing is complete if you don’t want it to stick around.

    Using DB intrinsics

    Various DBs have separate lock or sequencing primitives. I doubt they are a good fit here but they might be. Check your DB engine’s documentation – this tends to be well off the beaten path as far as standardized SQL syntax is concerned

    hash bucketing

    Yet another option is to find some unique property amongst all potential rows (for example, the unique ID column) and then design the following algorithm:

    I launch X threads; each thread gets a ‘mod’ field, which is 0…X (if I launch 5 threads, the first one has mod = 0, the second has mod = 1,.., the 5th has mod = 4. They also all know X (each thread has procLen = 5).

    Each thread instantly, without establishing any locks or even establishing a DB connection in the first place, inherently claims rows: It does that by stating that any given thread only processes those rows whose unique modifier, possibly hashed if needed (it’d be needed if the unique input isn’t uniformly distributed, or isn’t a simple number) for which the following is true:

    uniqueId % procLen = mod
    

    In other words, if you have 5 threads and are using the UNID as unique source, the first thread only takes rows with UNID 0, 5, 10, 15, 20, etc. The second one only 1,6,11,16,21, the third only 2,7,12,17,22, and so forth. DBs can generally give you the proper rows fairly quickly (UNID columns get indices by default for example). And there is no need to lock anything: Once all threads have processed their domains, you know every row has been processed. And if the size of the work to be done is generally equal (it takes about as long to process any row; contrast to a system where processing a row might take 5 seconds, or it might take 5 hours, depending on what’s in it – then this is potentially problematic), you are unlikely to need a rebalancing function. Rebalancing is what you need to do when 4 of the 5 threads have processed all their rows but the 5th thread still has, say, 20 to go through: It would want to assign some of the work to the threads that are twiddling their thumbs. You don’t need it, if the unique ID is uniformly distributed and time taken per row is essentially equal.

    Login or Signup to reply.
  2. You should try with manual transaction like:

    c.setAutoCommit(false);
    c.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED);
    Statement sth = c.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_UPDATABLE) ;
    sth.execute("BEGIN");
    ResultSet rst = sth.executeQuery("SELECT * FROM test WHERE flag!='Y' LIMIT 1 FOR UPDATE SKIP LOCKED") ;
    Long id = null ;
    if (rst.next()) {
        id = rst.getLong("id") ;
        rst.updateString("flag", "Y");
        rst.updateRow();
    }
    sth.execute("COMMIT");
    return id ;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search