How are you?
I am using memcached
for a while now and love it.
Bheind the cache I am tipicly using Postgress
or any relational database.
There are some cases when using cache make it complex. and I am not sure about the performance cost of those cases. so better ask here.
Imagine a situation where I have two API’s
def all(filter_1: int = None, filter_2: int = None, filter_3: int = None)
To filter the all results.def update(id, data: dict)
(to update a single item)
on all
I will cache the results, example:
all()
CACHE_KEY_ALL
.all(filter_1=11)
.
CACHE_KEY_ALL_filter_1_11
.all(filter_1=11, filter_3=three)
CACHE_KEY_ALL_filter_1_11_filter_3_three
on eveery update
call I will need to clear the cache for all the all
cached items.
What I am doing today, I such cases I am just not using any cache.
My question is a performance question.
What is faster
- No using cache at all in this cases
- Call
stats items
fetch all keys. look for keys starting withCACHE_KEY_ALL
and invalidate each one of them
What do you think?
🙏
2
Answers
I am doing okay hope you are doing well as well.
First, I want to point out the function naming. Function names should contain verbs because that makes them more readable. I was a bit confused by the all() naming so it’s better for other developers if you could use some verb in the function name.
Regarding the cache question, initially it does sound like there’s more operations involved when you use caching. However, all those operations may still be faster than calling the API. So the question whether it’s worth implementing the caching depends entirely (in my opinion) whether that particular API is fast enough or not. I do not have access to that API therefore I can’t definitively tell you which is faster than the other.
Hope my answer helps.
The performance of this largely depends on how often you do each operation.
Lets say you have four kinds of operations:
The case where a cache is effective is when it gets hit a lot. For your example this is the case when the clients use the same filter, or no filter (1,2). If they use a filter no one has used in a while, or an update just have happened it will miss (3,4). I suggest you start by measuring the usage, or investigate the code, and determine how much each of these cases happen.
If updates are frequent (in a relative order of magnitude to the all) you can probably skip the cache all together. If the all operation uses no filter, or the same filter for a lot of clients, the cache will probably be efficient. If each client provide their own filter, then the cache is probably inefficient.
When you got this data profiled, you can then continue by making resonable changes, and when you do these continue monitor the performance of the system, and see if the performance change like you expect. But right now it’s hard to say without more information if your system does operations 1 & 2 the most, or operation 3 & 4.