How to efficiently handle large JSON files in Python without running out of memory? - PhpOut

WalterVega
January 10, 2025
124 views
0 votes
2 Answers

I’m working on a project where I need to process a JSON file that’s over 5.5 GB in size.

I’ve tried using json.load() from the json module, but it loads the entire file into memory, which isn’t practical for this size.

Thank you

Tags: json python

Answers

- Guluna
- January 10, 2025 at 1:28 am
- 0 votes
0
You might want to use ijson library for processing large json files in python without running into memory issues. Here is a detailed article on how to use it.

Login or Signup to reply.

- Schwern
- January 10, 2025 at 1:46 am
- 0 votes
0
Rather than trying to decode the whole document then work with the data, you need to use a streaming decoder. This looks at the JSON as a "stream" of data and you work on each piece of the data at a time.

One example is json-stream which has several modes. The simplest is transient mode which reads the JSON but doesn’t store the whole document. This is useful if you’re reading a large array or dictionary.
```
import json_stream

# JSON: [1, 2, 3, 4, 5, ...]
nums = json_stream.load(f)

for num in nums:
  print(num)
```
For more complex data, use the visitor pattern where you pass in a function to handle each piece of data.
```
import json_stream

# JSON: {"x": 1, "y": {}, "xxxx": [1,2, {"yyyy": 1}, "z", 1, []]}

def visitor(item, path):
    print(f"{item} at path {path}")

json_stream.visit(f, visitor)
```
```
1 at path ('x',)
{} at path ('y',)
1 at path ('xxxx', 0)
2 at path ('xxxx', 1)
1 at path ('xxxx', 2, 'yyyy')
z at path ('xxxx', 3)
1 at path ('xxxx', 4)
[] at path ('xxxx', 5)
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.