I’m working on a multiprocessing python application where multiple processes need access to a large, pre-loaded spaCy NLP model (e.g., en_core_web_lg). Since the model is memory-intensive, I want to avoid loading it separately in each process, since I quickly run out of main memory and the object is read-only. Instead, I’d like to load it once in a shared location so that all processes can read from it without duplicating memory usage.
I have looked into multiprocessing.Manager and multiprocessing.shared_memory, but these approaches seem better suited to NumPy arrays, raw data buffers or simple objects, not complex objects with internal references like an NLP model. I have also looked into MPI’s MPI.Win.Allocate_shared() but I ran into the same issues. Using a redis server and make rank 0 do all the processing works with MPI, but since all the processing is done by a single rank, it defeats the propose I had for using multiprocessing.
-
Is there an efficient way to share a spaCy model instance across multiple processes in Python to avoid reloading it for each process?
-
Are there libraries or techniques specifically suited for sharing complex, read-only objects like NLP models in memory across processes?
-
If multiprocessing.Manager or shared_memory is viable here, are there ways to improve performance or reduce memory overhead when working with complex objects?
Any suggestions or examples would be greatly appreciated! Thank you!
2
Answers
I would strongly advise you not to treat NLP models like any other Python object. I would always prefer to load an NLP model using a microservice approach, which is more aligned with ML/software engineering best practices by separating the model logic from the main application.
Instead of loading the model in each process (which can be memory-intensive), the model is loaded just once in a dedicated service. This setup allows the model to be used by multiple parts of the application without duplicating memory usage, making it efficient, modular, and scalable. Not only is your concern about memory efficiency addressed, but scalability and modularity are also improved.
An example of implementing such a microservice using FastAPI + Docker could look like this:
To containerize above FastAPI service:
Python makes this quite difficult to solve well. We do have built-in support for multiprocessing: https://spacy.io/usage/processing-pipelines#multiprocessing However if you’re invoking spaCy on small batches of documents this doesn’t work well.
If the built-in multiprocessing doesn’t work, my recommendation would be to give up on shared memory parallelism and just accept that each process you run takes more memory than you’d like. spaCy still works out to be fairly cheap to run overall.