Suppose you have a JSON file like this:
{
"a": 0
}
{
"a": 1
}
It’s not JSONL, because each object takes more than one line. But it’s not a single valid JSON object either. It’s sequentially listed pretty-printed JSON objects.
json.loads
in Python gives an error about invalid formatting if you attempt to load this, and the documentation indicates it only loads a single object. But tools like jq
can read this kind of data without issue.
Is there some reasonable way to work with data formatted like this using the core json library? I have an issue where I have some complex objects and while just formatting the data as JSONL works, for readability it would be better to store the data like this. I can wrap everything in a list to make it a single JSON object, but that has downsides like requiring reading the whole file in at once.
There’s a similar question here, but despite the title the data there isn’t JSON at all.
3
Answers
You can partially decode text as JSON with
json.JSONDecoder.raw_decode
. This method returns a 2-tuple of the parsed object and the ending index of the object in the string, which you can then use as the starting index to partially decode the text for the next JSON object:so that:
outputs:
Demo here
Note that the starting index as the second argument to
json.JSONDecoder.raw_decode
is an implementation detail, and that if you want to stick to the publicly documented API you would have to use the less efficient approach of slicing the string (which involves copying the string) from the index before you pass it toraw_decode
:Here is a way: Attempt to
json.loads()
, thenerror.pos
Code:
The best option would be to use build-in python library pprint
Here
stuff is a dictionary object.
If json is listed in a file you can load it using
stuff = json.load(file_path)
otherwise if it is a file then you can use
stuff = json.load(file_path). As for the printing is concerned ppprint will do the job for you.