skip to Main Content

I have a byte array that I made, and I am writing it to a json file. This works, but I want to have a formatted JSON file instead of a massive wall of text.

I have tried decoding the byte array with utf-8, but instead I get UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte. My plan was to then take this string and use json.dumps() to format it.

Trying json.dumps() without any other formatting gives this: TypeError: Object of type bytearray is not JSON serializable

content = bytearray()
content_idx = 0

try:
  with open(arguments.input_file, 'rb') as input_file:
    while (byte:=input_file.read(1)):
      content += bytes([ord(byte) ^ xor_key[content_idx % (len(xor_key))]])
      content_idx += 1
except (IOError, OSError) as exception:
  print('Error: could not read input file')
  exit()

try:
  with open(arguments.output_file, 'wb') as output_file:
    output_file.write(json.dumps(content.decode('utf-8'), indent=4))
except (IOError, OSError) as exception:
  print('Error: could not create output file')
  exit()

2

Answers


  1. The error is that you are trying to pass then bytes, and json.dumps() is trying to serialize them somehow, but can’t, which is written in the error output.

    To save the file in JSON you need to translate the byte stream into a Python dictionary, which will already accept JSON perfectly and without problems.

    It would help if you could show what the input data looks like and what you want to save to JSON

    Python has an off-the-shelf Base64 library that can translate an array of bytes into a string, and here’s an example usage article. But the problem may arise later when parsing that string into the dictionary, so maybe I’d advise you to google what libraries are probably ready for such parsing, but otherwise you can use regular expressions

    Login or Signup to reply.
  2. The JSON encoder and decoder can be extended to support other types. Here’s one way to support byte strings by converting them to a BASE64 str and serializing it as a dict with special key. The key is used to flag the decoder to convert the JSON object with that key back to a byte string.

    import json
    import base64
    
    class B64Encoder(json.JSONEncoder):
        '''Recognize a bytes object and return a dictionary with
        a special key to indicate its value is a BASE64 string.
        '''
        def default(self, obj):
            if isinstance(obj, bytes):
                return {'__B64__': base64.b64encode(obj).decode('ascii')}
            return super().default(obj)
    
    def B64Decoder(obj):
        '''Recognize a dictionary with the special BASE64 key
        and return its BASE64-decoded value.
        '''
        if '__B64__' in obj:
            return base64.b64decode(obj['__B64__'])
        return obj
    
    d = {'key1': bytes.fromhex('0102030405'), 'key2': b'xaax55x00xff'}
    print(f'Python IN: {d}')
    print('nJSON:')
    s = json.dumps(d, indent=2, cls=B64Encoder)
    print(s)
    d2 = json.loads(s, object_hook=B64Decoder)
    print(f'nPython OUT: {d}')
    

    Output:

    Python IN: {'key1': b'x01x02x03x04x05', 'key2': b'xaaUx00xff'}
    
    JSON:
    {
      "key1": {
        "__B64__": "AQIDBAU="
      },
      "key2": {
        "__B64__": "qlUA/w=="
      }
    }
    
    Python OUT: {'key1': b'x01x02x03x04x05', 'key2': b'xaaUx00xff'}
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search