Is there a way to read sequentially pretty-printed JSON objects in Python?

polm23
July 19, 2024
133 views
3 votes
3 Answers

Suppose you have a JSON file like this:

{
  "a": 0
}
{
  "a": 1
}

It’s not JSONL, because each object takes more than one line. But it’s not a single valid JSON object either. It’s sequentially listed pretty-printed JSON objects.

json.loads in Python gives an error about invalid formatting if you attempt to load this, and the documentation indicates it only loads a single object. But tools like jq can read this kind of data without issue.

Is there some reasonable way to work with data formatted like this using the core json library? I have an issue where I have some complex objects and while just formatting the data as JSONL works, for readability it would be better to store the data like this. I can wrap everything in a list to make it a single JSON object, but that has downsides like requiring reading the whole file in at once.

There’s a similar question here, but despite the title the data there isn’t JSON at all.

Answers

- blhsing
- July 19, 2024 at 7:33 am
- 0 votes
0
You can partially decode text as JSON with json.JSONDecoder.raw_decode. This method returns a 2-tuple of the parsed object and the ending index of the object in the string, which you can then use as the starting index to partially decode the text for the next JSON object:
```
import json

def iter_jsons(jsons, decoder=json.JSONDecoder()):
    index = 0
    while (index := jsons.find('{', index)) != -1:
        data, index = decoder.raw_decode(jsons, index)
        yield data
```
so that:
```
jsons = '''
{
  "a": 0
}
{
  "a": 1
}'''
for j in iter_jsons(jsons):
    print(j)
```
outputs:
```
{'a': 0}
{'a': 1}
```
Demo here

Note that the starting index as the second argument to json.JSONDecoder.raw_decode is an implementation detail, and that if you want to stick to the publicly documented API you would have to use the less efficient approach of slicing the string (which involves copying the string) from the index before you pass it to raw_decode:
```
def iter_jsons(jsons, decoder=json.JSONDecoder()):
    index = 0
    while (index := jsons.find('{', index)) != -1:
        data, index = decoder.raw_decode(jsons := jsons[index:])
        yield data
```
Login or Signup to reply.

Here is a way: Attempt to json.loads(), then

If succeeded, we are at the end of the string
If not, load the object up to the error spot, error.pos

Code:

import json

text = """
{
  "a": 0
}
{
  "a": 1
}
"""

obj_list = []
while True:
    try:
        obj_list.append(json.loads(text))
        # Success means we have reached the end of the string
        break
    except json.decoder.JSONDecodeError as error:
        # error.pos is where the error happens within the text
        valid_text, text = text[:error.pos], text[error.pos:]
        obj_list.append(json.loads(valid_text))

print(obj_list)

- Abhimanyu
- July 19, 2024 at 9:39 am
- 0 votes
0
The best option would be to use build-in python library pprint
Here

stuff is a dictionary object.
If json is listed in a file you can load it using
stuff = json.load(file_path)

otherwise if it is a file then you can use

stuff = json.load(file_path). As for the printing is concerned ppprint will do the job for you.
```
pprint.pp(stuff)
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.