skip to Main Content

I have Spring/Java project where I am thinking about caching some of the functions that access databases and other rest services. I am not sure at which point should I cache a certain flow. For example, if I have the following functions:

public List<String> getSchools()
{
     //call db to get names
}

public List<String> getCourses(String school)
{
    //rest call to get courses for a school
}

public List<String> getTeachers(String course)
{
    //call db to get teacher names for a course
}

/*Uses above three functions together*/
public List<String> getAllTeachers()
{
      List<String> schools = getSchools();
      List<String> courses = new ArrayList<String>();
      List<String> teachers = new ArrayList<String>();

      for(String s : schools)
          courses.addAll( getCourses( s ) );
      for(String c : courses)
         teachers.addAll( getTeachers( s ) );

     return teachers;
}

Here, which methods should I cache? Should I cache the function that call the other 3 resource-heavy functions or should I cache the three functions individually ? What would be generally considered good-practice ?

2

Answers


  1. Caching is normally used with data that does not often change.

    So, to decide whether to cache the results of individual methods vs getAllTeachers() is dependent on how often the data related to teachers/courses/school changes.

    I would suggest to caching the results of individual function (than the other one which calls the 3 methods) and have a time-gap after which, when a call is made to that individual function it would discard old data and re-init itself with latest data from db.

    For example, if the time-gap is 10 mins, then after every 10 mins a call made to these functions would return latest data.

        private Map<String,Integer> funcCalls;
        private Map<String,List<String>> funcResultCache;
        long timeGap= 600000;//10 mins
        public List<String> getSchools()
        {
           if(funcCalls.get("getSchools")-currentTimeInMills >=timeGap) {
           //call db to get names
           } else {
              funcResultCache.get("getSchools");
           }
        }
    

    One disadvantage of this approach is that users can get stale data if the data gets changed on 2nd minute but it would reflect only after 10 mins.

    The time-gap can be shortened to reduce this side-effect.

    There is one another way, probably a little cumbersome.
    Assuming that calls to database is going to retrieve huge amount of data then
    1) Have triggers on the tables for DML kind of statements (create/update/delete) which would write the time in millis in another table (say trans_tbl).

    2) Store the data of the trans_tbl initially.

    3) Before every call to DB to get the actual data, check the trans_tbl and check if the time retrieved earlier is greater than what you have. If ‘yes’ store the latest time and then fire query to get data from DB. Cache the results and return the same.

    One another aspect that you can explore is to see if the Database you are using can itself cache ‘select’ queries. If ‘yes’ then that would be an easier approach because no code changes are required, just tuning your DB to cache the select queries.

    Login or Signup to reply.
  2. Regarding this example I would never consider caching because data don’t change often.

    However, What I would do is the following:
    The First time i launch the program it should retrieve whatever it needs and put some variable called (changed) in the database and assign it the value 0. On the server side, whenever I insert,update, or delete something, I change this changed value to 1.
    The next time when i launch the application, it will retrieve the changed value, if it’s 1, then it will retrieve the ‘new’ information from the DB, then give it back the value 0. If the changed value was already 0, it will simply use the cache results.

    Note that your question is tricky. Because if you only load the teachers, the schools and the courses into the memory and want to get for example the list of the students who just dropped from the course (it’s just an example), it is up to you whether to load the students in the memory or keep it in the database.

    Keep in mind one simple thing to do, change the value to 1 when something has changed in the database and then whenever you query from within the program change that value back to zero.

    Last: To avoid race condition, You may want to consider updating the value back to zero from inside the select query. Don’t wait until you finish your job, because you want to consider other users using the application. Make sure the changed value is in a table and not in a file, having InnoDB as storage engine, because InnoDB locks by row and not by table.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search