skip to Main Content

I have a daemon that runs background jobs requested by our webservice. We have 4 workers running simultaneously.

Sometimes a job is executed twice at the same time, because two workers decided to run that job. To avoid this situation we tried several things:

  1. Since our jobs comes from our databases, we added a flag called executed, that prevents other works to get a job that has already been started to execute; This does not solve the problem, sometimes the delay with our database is enough to have simultaneous executions;
  2. Added memcached in the system (all workers run in the same system), but somehow we had simultaneous jobs running today — memcached does not solve for multiple servers as well.

Here is the following logic we are currently using:

// We create our memcached server
$memcached = new Memcached();
$memcached->addServer("127.0.0.1", 11211);

// Checkup every 5 seconds for operations
while (true) {
    // Gather all operations TODO
    // In this query, we do not accept operations that are set
    // as executed already.
    $result = findDaemonOperationsPendingQuery();

    // We have some results!
    if (mysqli_num_rows($result) > 0) {
        $op = mysqli_fetch_assoc($result);
        echo "Found an operation todo #" . $op['id'] . "n";

        // Set operation as executed
        setDaemonOperationAsDone($op['id'], 'executed');

        // Verifies if operation is happening on memcached
        if (get_memcached_operation($memcached, $op['id'])) {
            echo "tOperation id already executing...n";
            continue;

        } else {
            // Set operation on memcached
            set_memcached_operation($memcached, $op['id']);
        }

        ... do our stuff
    }
}

How this kind of problem is usually solved?
I looked up on the internet and found out a library called Gearman, but I’m not convinced that it will solve my problems when we have multiple servers.

Another thing I thought was to predefine a daemon to run the operation at insertion, and create a failsafe exclusive daemon that runs operations set by daemons that are out of service.

Any ideas?

Thanks.

2

Answers


  1. You have a typical concurrency problem.

    1. Worker 1 reads the table, select a job
    2. Worker 1 update the table to mark the job as ‘assigned’ or whatever
    3. Oh but wait, between 1 and 2, worker 2 read the table as well, and since the job wasn’t yet marked a ‘assigned’, worker 2 selected the same job

    The way to solve this is to use transactions and locks, in particular SELECT.. FOR UPDATE. It’ll go like this:

    1. Worker 1 starts a transaction (START TRANSACTION) and tries to acquire an exclusive lock SELECT * FROM jobs [...] FOR UPDATE
    2. Worker 2 does the same. Except he has to wait because Worker 1 already has the lock.
    3. Worker 1 updates the table to say he’s now working on the job and commit the transaction immediately. This releases the lock for other workers to select jobs. Worker 1 can now safely start working on this job.
    4. Worker 2 can now read the table and acquire a lock. Since the table has been updated, worker 2 will select a different job.

    EDIT: Specific comment about your PHP code:

    • Your comment says you are fetching all the jobs that needs to be done at once in each worker. You should only select one, do it, select one, do it, etc.
    • You are setting the flag ‘executed’ when in fact it’s not (yet) executed. You need a ‘assigned’ flag, and a different ‘executed’ flag.
    Login or Signup to reply.
  2. An alternative solution to using locks and transactions, assuming each worker has an id.

    In your loop run:

    UPDATE operations SET worker_id = :wid WHERE worker_id IS NULL LIMIT 1;
    
    SELECT * FROM operations where executed = 0 and worker_id = :wid;
    

    The update is a single operation which is atomic and you are only setting worker_id if it is not yet set so no worries about race conditions. Setting the worker_id makes it clear who owns the operation. The update will only assign one operation because of the LIMIT 1.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search