I’d like to share some large python objects in Django. They are just big tables of data that I’d like to quickly randomly access in memory. Think of just reading a dict that’s, say 35M on disk. So, not huge, not small. I’m considering them immutable. That is, read in on initialization, never change them. I’m willing to restart the server to get changes.
What is the best, most Django-friendly way to do this?
This question is like mine. This answer describes how to use Django’s low-level in-memory cache. Reading the documentation, there is an in-memory cache that is in-process and thread-safe. Perfect. However, only objects that can be pickled. I don’t want my 35M python object pickled, that seems awkward. And then does getting it back out unpickle it again? Per request? That sounds slow.
This blog post mentions django-lrucache-backend, that skips the pickling. However, it was last updated 2 years ago, and also says not to use it for "large data tables" (not sure why).
Recommendations?
EDIT: I understand the traditional answer, but I’d rather avoid pickling and Redis. Two reasons: 1) I’d rather avoid writing a bunch of lines of code (pickling) or maintaining another component (Redis), 2) it seems slower to unpickle large objects (is it on every request?).
2
Answers
I ended up hanging my data off of the Django AppConfig object, specifically the ready method.
Others also seem to do this, for example here. That example didn't use the ready method, but it did use AppConfig.
Depending on the object you want to store you need to pickle and unpickle. But this is not a performance issue. You have two possibilities, if it is a dict you can use a JSON structure otherwise just use
django-redis
as cache backend and let django store the object in the cache (redis).Django-redis
supports also connection pooling.