I noticed that when using with open
to read a json, using either r
or rb
parameters returns identical results.
with open('something.json', 'rb') as f # 'r' returns the same thing
t1 = json.load(f)
However, when I write to a json with wb
, I get an error:
with open('something.json', 'wb') as f:
json.dump(some_dict, f)
TypeError: a bytes-like object is required, not ‘str’
But w
works fine. Why is this the case?
4
Answers
This happens because you are not writing in binary form over the JSON, try using
json.dumps(some_dict)
to transform your dictionary into a json file and then usesome_dict.encode('utf-8')
to write to it as binary.Something like this:
As of Python 3.6, the
json
module tries to auto-detect the encoding of a binary file when reading JSON. UTF-8, UTF-16, or UTF-32 are supported.However, this doesn’t work when writing. There’s no way to auto-detect what encoding you wanted to use when writing.
json
could make an assumption, likeopen
does when you open a file in text mode without specifying an encoding, but the tradeoffs are different.If
json
were to assume UTF-8 encoding (or any other encoding) when writing to a binary file, then you might end up reading UTF-32 and writing UTF-8 and not noticing the encoding change until some other code that needs UTF-32 breaks. That’s less of an issue withopen
, becauseopen
makes the same assumption when reading or writing, instead of trying to auto-detect encoding when reading.Since auto-detection is impossible and assuming an encoding would be error-prone,
json
requires you to give it a text file when writing.When opening the file in read-only binary mode
rb
,f.read()
function returns bytes instead of string (for text mode), both of which are acceptable for json.load function argumentWhen openning the file in write-only binary mode
wb
,f.write()
function only supports byte-like objects as argument instead ofstr
. so when json.dump calls it and trying to pass the string value tof.write
, it will raise the exception you saw aboveSince
json.load
andjson.loads
are given data to inspect, they try to guess an encoding for the data when given binary. But it can’t do that when dumping data because there isn’t anything to inspect. So, there is a lack of symmetry. You can give a binary file to the reader, risking that it will guess encoding incorrectly. But not the writer. It will only work with strings and you are expected to deal with encoding on your own.Personally, I wouldn’t trust the encoding guesser and would always open in "r" mode with an explicit encoding.