skip to Main Content

How are you?

I am using memcached for a while now and love it.
Bheind the cache I am tipicly using Postgress or any relational database.

There are some cases when using cache make it complex. and I am not sure about the performance cost of those cases. so better ask here.

Imagine a situation where I have two API’s

  1. def all(filter_1: int = None, filter_2: int = None, filter_3: int = None) To filter the all results.
  2. def update(id, data: dict) (to update a single item)

on all I will cache the results, example:

  1. all()
    CACHE_KEY_ALL.
  2. all(filter_1=11).
    CACHE_KEY_ALL_filter_1_11.
  3. all(filter_1=11, filter_3=three)
    CACHE_KEY_ALL_filter_1_11_filter_3_three

on eveery update call I will need to clear the cache for all the all cached items.

What I am doing today, I such cases I am just not using any cache.

My question is a performance question.

What is faster

  1. No using cache at all in this cases
  2. Call stats items fetch all keys. look for keys starting with CACHE_KEY_ALL and invalidate each one of them

What do you think?

🙏

2

Answers


  1. I am doing okay hope you are doing well as well.

    First, I want to point out the function naming. Function names should contain verbs because that makes them more readable. I was a bit confused by the all() naming so it’s better for other developers if you could use some verb in the function name.

    Regarding the cache question, initially it does sound like there’s more operations involved when you use caching. However, all those operations may still be faster than calling the API. So the question whether it’s worth implementing the caching depends entirely (in my opinion) whether that particular API is fast enough or not. I do not have access to that API therefore I can’t definitively tell you which is faster than the other.

    Hope my answer helps.

    Login or Signup to reply.
  2. The performance of this largely depends on how often you do each operation.
    Lets say you have four kinds of operations:

    1. All – Without filter
    2. All – With a common filter, that’s the same for most clients
    3. Update – (invalidates the whole cache)
    4. All – With a customized filter, that is spread a lot between clients

    The case where a cache is effective is when it gets hit a lot. For your example this is the case when the clients use the same filter, or no filter (1,2). If they use a filter no one has used in a while, or an update just have happened it will miss (3,4). I suggest you start by measuring the usage, or investigate the code, and determine how much each of these cases happen.

    If updates are frequent (in a relative order of magnitude to the all) you can probably skip the cache all together. If the all operation uses no filter, or the same filter for a lot of clients, the cache will probably be efficient. If each client provide their own filter, then the cache is probably inefficient.

    When you got this data profiled, you can then continue by making resonable changes, and when you do these continue monitor the performance of the system, and see if the performance change like you expect. But right now it’s hard to say without more information if your system does operations 1 & 2 the most, or operation 3 & 4.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search