I’m dealing to create an architecture where most of the data is quite stable, there is no need for strict updates of it, and it updates not very often. Furthermore the size of that information is not much (around a couple of MB).
Therefore, instead of using Memcached, when I initialise an instance in Google AppEngine (using GoLang) I first get all the information, cache it in memory as a warm up, and work with it.
But I need to update it every x time. Therefore, I need a way to address a specific instance and update that memory cache of it.
If though of different solutions:
- After processing a user request, if the information is outdated, update it in the background. Problem? That user request may take quite a bit to finish. Even if I try to close the connection to the user and flush data, until the process is done, the user doesn’t fully process the request. Furthermore if many requests come when I need to update the cache I should also deal with concurrency to avoid doing it more than once.
- Using Cron + Pub/Sub (similar as done in here: https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine, but there is for Cloud Engine). The problem is that I can only “hit” one instance from a service at once through an endpoint URL I define, therefore I cannot update all the instances at will.
- Kill and renew instances. I don’t like this very much for obvious reasons.
Using Basic Scaling is posible to address a specific instance, but I can’t find a way to do it using Automatic Scaling, as seen here: https://cloud.google.com/appengine/docs/standard/go/how-instances-are-managed
So…, can you imagen any graceful way to update the memory state of all the instances at a time without disturbing the clients?
How can I hit all of the instances of AppEngine individually to do a update of the in-memory cache?
3
Answers
Accessing all dynamic instances is usually troublesome and something you should not rely on.
Instead redesign and use a different approach.
Have all your instances use an in-memory cache, but use an expiration time for the cached data. Whenever such data is needed, first check if the data is still valid (check the expiration time), and if it is, go ahead and use it. If it has expired, fetch the new, actual data from “some” place. This “some” place may be the Memcache or the Datastore, or optionally both like first try in the Memcache, if not there, then from Datastore; or it may be at a completely different place, even outside of Google Cloud Platform. Fetching new data should contain its expiration time.
This approach does not require you to reach the dynamic instances, they will take care of refreshing their cached data once it expires, automatically.
If accessed from multiple goroutines, access to the cached data must be synchronized. Best would be to use a
sync.RWMutex
, so you can allow multiple readers without blocking each other (frequent operation), and only acquire write lock if the cached data has expired and needs to be refreshed.Here’s an example implementation of such in-memory cache:
In addition to the icza’s answer:
You can target specific instances only when using manual scaling:
source
I would suggest:
In 3 you live with returning old data but triggering it to be updated. If this is no good consider using the Cron in app engine.