I am trying to save in a .json a string and it has to have the following format "uXXXX" but i get the unicodeescape error:
(unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated uXXXX escape
Doing a raw string doesn’t work, because I can’t get a string like this as a result: "\uXXXX".
This is what im trying:
import json
bp = {"type": "bitmap",
"file": "file",
"chars": [f"u{hex(57392)[2:].upper()}"]
}
with open(dir, 'w') as f:
json.dump(bp, f, indent = 4)
And i try to get:
{
"type": "bitmap",
"file": "file:,
"chars": ["uE030"]
}
I only want to generate the JSON, im not loading it afterwards.
Is there any way to save the string in the format I need?
2
Answers
Use
chr
to get the correct Unicode character, instead of trying to construct a Python escape sequence dynamically.By default,
json.dump
will encode non-ASCII strings usingu....
escapes, rather than UTF-8 characters.The
json
standard library automatically escapes the data in strings that are passed to it, when it writes the file withjson.dump
(or creates the string withjson.dumps
) – it has to, because the JSON file format requires that. It is not valid JSON, for example, if the file says something like{"bad": ""}
. So if the input dictionary is{'bad': '\'}
(i.e., the value contains one backslash, which is represented as two in the source code), then the output JSON will show two backslashes in the actual file.Similarly with Unicode escapes. If the file should contain
"ue030"
, then the input string should contain the actual character represented by this escape sequence – for example, the source code could contain a string literal with that actual text (''
), or the source code could itself use an escape sequence ('ue030'
).Uppercase vs. lowercase does not matter; it means the same thing when the file is read by a proper JSON tool. In fact, a valid JSON file can contain either the character or the escape sequence, too. It means the same thing.
By default,
json
will already choose to use the escape sequence when it writes out the data. (To make it put Unicode text in the file instead, passensure_ascii=False
.)It is not possible to "construct" an escape sequence by "joining" for example
'u'
with'e030'
, because'u'
is invalid on its own. Interpreting the escape sequences in string literals in the Python code, happens before any operations like joining strings together.To turn a number into the corresponding Unicode code point, use
chr
.Let’s demonstrate at the REPL:
Again: we do not get to control the
e
being lowercase, and we do not need to. It means the same thing when the file is read.