skip to Main Content

I am trying to save in a .json a string and it has to have the following format "uXXXX" but i get the unicodeescape error:

(unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated uXXXX escape

Doing a raw string doesn’t work, because I can’t get a string like this as a result: "\uXXXX".

This is what im trying:

import json

bp = {"type": "bitmap",
                "file": "file",
                "chars": [f"u{hex(57392)[2:].upper()}"]
                }
with open(dir, 'w') as f:
        json.dump(bp, f, indent = 4)

And i try to get:

{
"type": "bitmap",
"file": "file:,
"chars": ["uE030"]
}

I only want to generate the JSON, im not loading it afterwards.

Is there any way to save the string in the format I need?

2

Answers


  1. Use chr to get the correct Unicode character, instead of trying to construct a Python escape sequence dynamically.

    bp = {"type": "bitmap",
          "file": "file",
          "chars": [chr(57392)]
         }
    

    By default, json.dump will encode non-ASCII strings using u.... escapes, rather than UTF-8 characters.

    Login or Signup to reply.
  2. The json standard library automatically escapes the data in strings that are passed to it, when it writes the file with json.dump (or creates the string with json.dumps) – it has to, because the JSON file format requires that. It is not valid JSON, for example, if the file says something like {"bad": ""}. So if the input dictionary is {'bad': '\'} (i.e., the value contains one backslash, which is represented as two in the source code), then the output JSON will show two backslashes in the actual file.

    Similarly with Unicode escapes. If the file should contain "ue030", then the input string should contain the actual character represented by this escape sequence – for example, the source code could contain a string literal with that actual text (''), or the source code could itself use an escape sequence ('ue030').

    Uppercase vs. lowercase does not matter; it means the same thing when the file is read by a proper JSON tool. In fact, a valid JSON file can contain either the character or the escape sequence, too. It means the same thing.

    By default, json will already choose to use the escape sequence when it writes out the data. (To make it put Unicode text in the file instead, pass ensure_ascii=False.)

    It is not possible to "construct" an escape sequence by "joining" for example 'u' with 'e030', because 'u' is invalid on its own. Interpreting the escape sequences in string literals in the Python code, happens before any operations like joining strings together.

    To turn a number into the corresponding Unicode code point, use chr.

    Let’s demonstrate at the REPL:

    >>> import json
    >>> bp = {"type": "bitmap",
    ...                 "file": "file",
    ...                 "chars": [chr(57392)]
    ...                 }
    >>> 
    >>> print(json.dumps(bp, indent=4))
    {
        "type": "bitmap",
        "file": "file",
        "chars": [
            "ue030"
        ]
    }
    

    Again: we do not get to control the e being lowercase, and we do not need to. It means the same thing when the file is read.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search