I’m working on a project where I need to process a JSON file that’s over 5.5 GB in size.
I’ve tried using json.load() from the json module, but it loads the entire file into memory, which isn’t practical for this size.
Thank you
I’m working on a project where I need to process a JSON file that’s over 5.5 GB in size.
I’ve tried using json.load() from the json module, but it loads the entire file into memory, which isn’t practical for this size.
Thank you
2
Answers
You might want to use ijson library for processing large json files in python without running into memory issues. Here is a detailed article on how to use it.
Rather than trying to decode the whole document then work with the data, you need to use a streaming decoder. This looks at the JSON as a "stream" of data and you work on each piece of the data at a time.
One example is json-stream which has several modes. The simplest is transient mode which reads the JSON but doesn’t store the whole document. This is useful if you’re reading a large array or dictionary.
For more complex data, use the visitor pattern where you pass in a function to handle each piece of data.