skip to Main Content

My Google App Engine application (Python3, standard environment) serves requests from users: if there is no wanted record in the database, then create it.

Here is the problem about database overwriting:

When one user (via browser) sends a request to database, the running GAE instance may temporarily fail to respond to the request and then it creates a new process to respond this request. It results that two instances respond to the same request. Both instances make a query to database almost in the same time, and each of them finds there is no wanted record and thus creates a new record. It results as two repeated records.

Another scenery is that for certain reason, the user’s browser sends twice requests with time difference less than 0.01 second, which are processed by two instances at the server side and thus repeated records are created.

I am wondering how to temporarily lock the database by one instance to prevent the database overwriting from another instance.

I have considered the following schemes but have no idea whether it is efficient or not.

  1. For python 2, Google App Engine provides "memcache", which can be used to mark the status of query for the purpose of database locking. But for python3, it seems that one has to setup a Redis server to rapidly exchange database status among different instances. So, how about the efficiency of database locking by using Redis?

  2. The usage of session module of Flask. The session module can be used to share data (in most cases, the login status of users) among different requests and thus different instances. I am wondering the speed to exchange the data between different instances.


Appended information (1)

I followed the advice to use transaction, but it did not work.
Below is the code I used to verify the transaction.

The reason of failure may be that the transaction only works for CURRENT client. For multiple requests at the same time, the server side of GAE will create different processes or instances to respond to the requests, and each process or instance will have its own independent client.

    @staticmethod
    def get_test(test_key_id, unique_user_id, course_key_id, make_new=False):
        client = ndb.Client()
        with client.context():
            from google.cloud import datastore
            from datetime import datetime
            client2 = datastore.Client()
            print("transaction started at: ", datetime.utcnow())
            with client2.transaction():
                print("query started at: ", datetime.utcnow())
                my_test = MyTest.query(MyTest.test_key_id==test_key_id, MyTest.unique_user_id==unique_user_id).get()
                import time
                time.sleep(5)
                if make_new and not my_test:
                    print("data to create started at: ", datetime.utcnow())
                    my_test = MyTest(test_key_id=test_key_id, unique_user_id=unique_user_id, course_key_id=course_key_id, status="")
                    my_test.put()
                    print("data to created at: ", datetime.utcnow())
            print("transaction ended at: ", datetime.utcnow())
            return my_test

Appended information (2)

Here is new information about usage of memcache (Python 3)
I have tried the follow code to lock the database by using memcache, but it still failed to avoid overwriting.

@user_student.route("/run_test/<test_key_id>/<user_key_id>/")
def run_test(test_key_id, user_key_id=0):

    from google.appengine.api import memcache
    import time

    cache_key_id = test_key_id+"_"+user_key_id
    print("cache_key_id", cache_key_id)

    counter = 0
    client = memcache.Client()
    while True:  # Retry loop
        result = client.gets(cache_key_id)
        if result is None or result == "":
            client.cas(cache_key_id, "LOCKED")
            print("memcache added new value: counter = ", counter)
            break
        time.sleep(0.01)
        counter+=1
        if counter>500:
            print("failed after 500 tries.")
            break

    my_test = MyTest.get_test(int(test_key_id), current_user.unique_user_id, current_user.course_key_id, make_new=True)
    client.cas(cache_key_id, "")
    memcache.delete(cache_key_id)

3

Answers


  1. Chosen as BEST ANSWER

    I used memcache in the following way (using get/set ) and succeeded in locking the database writing.

    It seems that gets/cas does not work well. In a test, I set the valve by cas() but then it failed to read value by gets() later.

    Memcache API: https://cloud.google.com/appengine/docs/standard/python3/reference/services/bundled/google/appengine/api/memcache

    @user_student.route("/run_test/<test_key_id>/<user_key_id>/")
    def run_test(test_key_id, user_key_id=0):
    
        from google.appengine.api import memcache
        import time
    
        cache_key_id = test_key_id+"_"+user_key_id
        print("cache_key_id", cache_key_id)
    
        counter = 0
        client = memcache.Client()
        while True:  # Retry loop
            result = client.get(cache_key_id)
            if result is None or result == "":
                client.set(cache_key_id, "LOCKED")
                print("memcache added new value: counter = ", counter)
                break
            time.sleep(0.01)
            counter+=1
            if counter>500:
                return "failed after 500 tries of memcache checking."
    
        my_test = MyTest.get_test(int(test_key_id), current_user.unique_user_id, current_user.course_key_id, make_new=True)
        client.delete(cache_key_id)
    
        ...
    
    

  2. Transactions:
    https://developers.google.com/appengine/docs/python/datastore/transactions

    When two or more transactions simultaneously attempt to modify entities in one or more common entity groups, only the first transaction to commit its changes can succeed; all the others will fail on commit.

    You should be updating your values inside a transaction. App Engine’s transactions will prevent two updates from overwriting each other as long as your read and write are within a single transaction. Be sure to pay attention to the discussion about entity groups.

    You have two options:

    • Implement your own logic for transaction failures (how many times to
      retry, etc.)
    • Instead of writing to the datastore directly, create a task to modify
      an entity. Run a transaction inside a task. If it fails, the App
      Engine will retry this task until it succeeds.
    Login or Signup to reply.
  3. If the problem is duplication but not overwriting, maybe you should specify data id when creating new entries, but not let GAE generate a random one for you. Then the application will write to the same entry twice, instead of creating two entries. The data id can be anything unique, such as a session id, a timestamp, etc.

    The problem of transaction is, it prevents you modifying the same entry in parallel, but it does not stop you creating two new entries in parallel.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search