Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Split large JSON file into smaller files

kites
December 27, 2022
193 views
0 votes
2 Answers

I have a need to split very large json file (20GB) into multiple smaller json files (Say threshold is 100 MB).

The Example file layout looks like this.

file.json

[{"name":"Joe", "Place":"Denver", "phone_number":["980283", "980284", "980285"]},{"name":"kruger", "Place":"boston",
 "phone_number":["980281", "980282", "980283"]},{"name":"Dan", "Place":"Texas","phone_number":["980286", "980287", "980286"]}, {"name":"Kyle", "Place":"Newyork", "phone_number":["980282", "980288", "980289"]}]

The output should look like this

file1:

[{"name":"Joe", "Place":"Denver", "phone_number":["980283", "980284", "980285"]}, {"name":"kruger", "Place":"boston", "phone_number":["980281", "980282", "980283"]}]

file2:

[{"name":"Dan", "Place":"Texas","phone_number":["980286", "980287", "980286"]}, {"name":"Kyle", "Place":"Newyork", "phone_number":["980282", "980288", "980289"]}]

May I know the best way to achieve this? Should i opt for shell command or python?

Answers

- JimmyNJ
- December 27, 2022 at 5:13 pm
- 0 votes
0
The Python module json-stream can do this, with a few caveats, which I’ll get to later.

You’ll have to implement the visitor pattern.
```
import json_stream

def visitor(item, path):
    print(f"{item} at path {path}")

with open('mylargejsonfile.json','r') as f:
    json_stream.visit(f, visitor)
```
This visitor function will get called for each complete JSON element encountered in a depth-first manner. So, each complete JSON element (number, string, array, etc) will invoke this callback. It is up to you at which point to pause processing and write your partial file out.

Things to look out for include if your input file is a single JSON element (like a single dictionary) you will have to change the output structure if you want the split-up files to also be valid JSON.

An illustrative example of this would be to try to split this JSON file
{ "top" : [1,2,3] } into two separate files of half the size. You can’t without changing the data structure.
Login or Signup to reply.

- J233r244meRichard
- December 27, 2022 at 5:14 pm
- 0 votes
0
As long as the file is structured that way with 1 item per line and no item in the main list that are a sub-list, you you just do a basic string replacement with sed. This is fragile, but relatively fast and memory efficient since sed is designed for streaming text.

Here is an example modifying "file.json" in-place:
```
sed -e 's/^[//g' -e 's/, *$//g' -e 's/]$//g' -i file.json
```
Then each line can be written in a separate file using a basic bash loop using read.

To compute the input file without modifying it and write the target files, you can do that:
```
i=1
sed -e 's/^[//g' -e 's/, *$//g' -e 's/]$//g' file.json | while read -r line; do
    echo -e "[$line]" > file$i
    i=$((i+1))
done
```
For the example file, it creates two files: file1 and file2
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.