skip to Main Content

Suppose you have a JSON file like this:

{
  "a": 0
}
{
  "a": 1
}

It’s not JSONL, because each object takes more than one line. But it’s not a single valid JSON object either. It’s sequentially listed pretty-printed JSON objects.

json.loads in Python gives an error about invalid formatting if you attempt to load this, and the documentation indicates it only loads a single object. But tools like jq can read this kind of data without issue.

Is there some reasonable way to work with data formatted like this using the core json library? I have an issue where I have some complex objects and while just formatting the data as JSONL works, for readability it would be better to store the data like this. I can wrap everything in a list to make it a single JSON object, but that has downsides like requiring reading the whole file in at once.

There’s a similar question here, but despite the title the data there isn’t JSON at all.

3

Answers


  1. You can partially decode text as JSON with json.JSONDecoder.raw_decode. This method returns a 2-tuple of the parsed object and the ending index of the object in the string, which you can then use as the starting index to partially decode the text for the next JSON object:

    import json
    
    def iter_jsons(jsons, decoder=json.JSONDecoder()):
        index = 0
        while (index := jsons.find('{', index)) != -1:
            data, index = decoder.raw_decode(jsons, index)
            yield data
    

    so that:

    jsons = '''
    {
      "a": 0
    }
    {
      "a": 1
    }'''
    for j in iter_jsons(jsons):
        print(j)
    

    outputs:

    {'a': 0}
    {'a': 1}
    

    Demo here

    Note that the starting index as the second argument to json.JSONDecoder.raw_decode is an implementation detail, and that if you want to stick to the publicly documented API you would have to use the less efficient approach of slicing the string (which involves copying the string) from the index before you pass it to raw_decode:

    def iter_jsons(jsons, decoder=json.JSONDecoder()):
        index = 0
        while (index := jsons.find('{', index)) != -1:
            data, index = decoder.raw_decode(jsons := jsons[index:])
            yield data
    
    Login or Signup to reply.
  2. Here is a way: Attempt to json.loads(), then

    1. If succeeded, we are at the end of the string
    2. If not, load the object up to the error spot, error.pos

    Code:

    import json
    
    text = """
    {
      "a": 0
    }
    {
      "a": 1
    }
    """
    
    obj_list = []
    while True:
        try:
            obj_list.append(json.loads(text))
            # Success means we have reached the end of the string
            break
        except json.decoder.JSONDecodeError as error:
            # error.pos is where the error happens within the text
            valid_text, text = text[:error.pos], text[error.pos:]
            obj_list.append(json.loads(valid_text))
    
    print(obj_list)
    
    Login or Signup to reply.
  3. The best option would be to use build-in python library pprint
    Here

    stuff is a dictionary object.
    If json is listed in a file you can load it using
    stuff = json.load(file_path)

    otherwise if it is a file then you can use

    stuff = json.load(file_path). As for the printing is concerned ppprint will do the job for you.

    pprint.pp(stuff)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search