skip to Main Content

In Python I have to keep dumping to a storage file an array of user data consisting of JSON objects (simple format: [{}, {}…]). As the program captures and records user data, the array [] will increase potentially into the small thousands of objects. Each object {} consists of up to about fifty key_value pairs.
How would I write a custom function for json.dump() using the native "default=" kwarg that will start each object on a new line, and add an extra linespace between? Using the "indent=" kwarg to pretty-print the array makes a file that is too long vertically, and without that kwarg I get a dense mass of objects hard to read.

To illustrate the problem, take a much shorter array.
example = [{"a":1, "b":2, "c":3,}, {"d":1, "e":2, "f":3,}, {"g":1, "h":2, "i":3}, {"j":6, "k":7, "l":8}]
Without using indent, the output to my .json file is too compact.

[{"a": 1, "b": 2, "c": 3}, {"d": 1, "e": 2, "f": 3}, {"g": 1, "h": 2, "i": 3}, {"j": 6, "k": 7, "l": 8}]

If I use indent, I get a narrow vertical column.

[
  {
    "a": 1,
    "b": 2,
    "c": 3
  },
  {
    "d": 1,
    "e": 2,
    "f": 3
  },
  {
    "g": 1,
    "h": 2,
    "i": 3
  },
  {
    "j": 6,
    "k": 7,
    "l": 8
  }
]

To minimize whitespace while maintaining readability, I want output in this format (here arranged manually).

[
    {"a": 1, "b": 2, "c": 3},

    {"d": 1, "e": 2, "f": 3},

    {"g": 1, "h": 2, "i": 3}, 

    {"j": 6, "k": 7, "l": 8}
]

When each object holds up to fifty key:value pairs, this format, even with wrapping, will achieve both ends. I’m surprised the method provides no such option.
But I can’t find enough info on the nuts and bolts of JSON code to impose newlines and extra linespaces within the existing json.dump() method. Any help?

3

Answers


  1. You can use json module with dumps():

    import json
    
    
    def _dump_one(example, p):
        with open(p, 'w') as f:
            f.write('[n')
            for i, e in enumerate(example):
                s = json.dumps(e)
                f.write(f't{s}')
                if i < len(example) - 1:
                    f.write(',nn')
                else:
                    f.write('n')
            f.write(']n')
    
    
    example = [
        {"a": 1, "b": 2, "c": 3},
        {"d": 1, "e": 2, "f": 3},
        {"g": 1, "h": 2, "i": 3},
        {"j": 6, "k": 7, "l": 8}
    ]
    
    _dump_one(example, 'file.json')
    
    

    You can also write() it only once:

    import json
    
    
    def _dump_two(example, p):
        res = '[n'
        for i, e in enumerate(example):
            s = json.dumps(e)
            res += f't{s}'
            if i < len(example) - 1:
                res += ',nn'
            else:
                res += 'n'
        res += ']n'
    
        with open(p, 'w') as f:
            f.write(res)
    
    
    example = [
        {"a": 1, "b": 2, "c": 3},
        {"d": 1, "e": 2, "f": 3},
        {"g": 1, "h": 2, "i": 3},
        {"j": 6, "k": 7, "l": 8}
    ]
    
    _dump_two(example, 'file.json')
    
    
    Login or Signup to reply.
  2. If your top-level object is an array, consider the JSON Lines (JSONL) format instead. It is literally one JSON object per line with no [] on the outside. The advantage is that you can append to JSONL easily without rewriting the whole file, which is great for efficiency – calling dump to dump an ever-expanding array will incur linear slowdown per dump (overall quadratic time!), whereas appending to a JSON Lines file is constant-time (overall linear time for all records).

    To append new objects to a JSON Lines file:

    with open("output.jsonl", "a") as outf:
        json.dump(new_record, outf)
        outf.write("n")
    

    To read all records from a JSON Lines file:

    with open("output.jsonl", "r") as inf:
        for row in inf:
            obj = json.loads(row)
            # then process obj or append it to a list, etc.
    
    Login or Signup to reply.
  3. How would I write a custom function for json.dump() using the native "default=" kwarg that will start each object on a new line, and add an extra linespace between?

    you can’t. default would only apply to values not to "rows", and even then your values are serializable

    If specified, default should be a function that gets called for objects that can’t otherwise be serialized − json module docs

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search