skip to Main Content

I’m dealing to create an architecture where most of the data is quite stable, there is no need for strict updates of it, and it updates not very often. Furthermore the size of that information is not much (around a couple of MB).

Therefore, instead of using Memcached, when I initialise an instance in Google AppEngine (using GoLang) I first get all the information, cache it in memory as a warm up, and work with it.

But I need to update it every x time. Therefore, I need a way to address a specific instance and update that memory cache of it.

If though of different solutions:

  • After processing a user request, if the information is outdated, update it in the background. Problem? That user request may take quite a bit to finish. Even if I try to close the connection to the user and flush data, until the process is done, the user doesn’t fully process the request. Furthermore if many requests come when I need to update the cache I should also deal with concurrency to avoid doing it more than once.
  • Using Cron + Pub/Sub (similar as done in here: https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine, but there is for Cloud Engine). The problem is that I can only “hit” one instance from a service at once through an endpoint URL I define, therefore I cannot update all the instances at will.
  • Kill and renew instances. I don’t like this very much for obvious reasons.

Using Basic Scaling is posible to address a specific instance, but I can’t find a way to do it using Automatic Scaling, as seen here: https://cloud.google.com/appengine/docs/standard/go/how-instances-are-managed

So…, can you imagen any graceful way to update the memory state of all the instances at a time without disturbing the clients?
How can I hit all of the instances of AppEngine individually to do a update of the in-memory cache?

3

Answers


  1. Accessing all dynamic instances is usually troublesome and something you should not rely on.

    Instead redesign and use a different approach.

    Have all your instances use an in-memory cache, but use an expiration time for the cached data. Whenever such data is needed, first check if the data is still valid (check the expiration time), and if it is, go ahead and use it. If it has expired, fetch the new, actual data from “some” place. This “some” place may be the Memcache or the Datastore, or optionally both like first try in the Memcache, if not there, then from Datastore; or it may be at a completely different place, even outside of Google Cloud Platform. Fetching new data should contain its expiration time.

    This approach does not require you to reach the dynamic instances, they will take care of refreshing their cached data once it expires, automatically.

    If accessed from multiple goroutines, access to the cached data must be synchronized. Best would be to use a sync.RWMutex, so you can allow multiple readers without blocking each other (frequent operation), and only acquire write lock if the cached data has expired and needs to be refreshed.

    Here’s an example implementation of such in-memory cache:

    func getFreshData() (data interface{}, expires time.Time, err error) {
        // Implement getting fresh data here:
        return nil, time.Now().Add(time.Minute), nil
    }
    
    type cachedData struct {
        sync.RWMutex
        data    interface{}
        expires time.Time
    }
    
    var cd = new(cachedData) // zero value is ready to use
    
    func Get() (data interface{}, err error) {
        cd.RLock()
        if time.Now().Before(cd.expires) {
            // We're done: we can use the cached data:
            data = cd.data
            cd.RUnlock()
            return
        }
    
        cd.RUnlock()
    
        // Either we don't have cached data or it has expired.
        // Acquire write lock and get data
        cd.Lock()
        defer cd.Unlock()
    
        // But once we have the write lock, check again, as another competing
        // goroutine might have fetched data before us:
        if time.Now().Before(cd.expires) {
            // Another goroutine fetched fresh data:
            return cd.data, nil
        }
    
        // Nope, we have to do it ourselves:
        data, expires, err = getFreshData()
        if err == nil {
            // Also put fresh data into the cache:
            cd.data = data
            cd.expires = expires
        } else {
            // There was an error getting it, set a 5 sec timeout to not keep calling:
            cachedData.data = nil
            cachedData.expires = time.Now().Add(5 * time.Second)
        }
    
        return
    }
    
    Login or Signup to reply.
  2. In addition to the icza’s answer:

    You can target specific instances only when using manual scaling:

    If you are using manually-scaled services, you can target and send a request to a instance by including the instance ID. The instance ID is an integer in the range from 0 up to the total number of instances that are running, and can be specified as follows:

    Sends a request to a specific service and version within a specific instance:

    https://[INSTANCE_ID]-dot-[VERSION_ID]-dot-[SERVICE_ID]-dot-[MY_PROJECT_ID].appspot.com
    http://[INSTANCE_ID].[VERSION_ID].[SERVICE_ID].[MY_CUSTOM_DOMAIN]
    

    Note: Targeting an instance is not supported in services that are configured for auto scaling or basic scaling. The instance ID must be an integer in the range from 0, up to the total number of instances running. Regardless of your scaling type or instance class, it is not possible to send a request to a specific instance without targeting a service or version within that instance.

    source

    Login or Signup to reply.
  3. I would suggest:

    1. Use memcache for your in-memory needs, thus working across all instances
    2. Back memcache with datastore: memcache can be cleared without warning so on a memcache miss, fetch from datastore and update memcache.
    3. In a request, if the data is old, create a TaskQueue entry. When this then executes, have it update datastore and memcache. However, critically you should name the Task and choose the same name in all instances: only one task can exist at a time with a given name, therefore multiple instances will not create multiple tasks.

    In 3 you live with returning old data but triggering it to be updated. If this is no good consider using the Cron in app engine.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search