Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Need to dump a unicode into a .json but i get unicodeescape error

JulianOlaldeApaseo
April 13, 2023
275 views
0 votes
2 Answers

I am trying to save in a .json a string and it has to have the following format "uXXXX" but i get the unicodeescape error:

(unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated uXXXX escape

Doing a raw string doesn’t work, because I can’t get a string like this as a result: "\uXXXX".

This is what im trying:

import json

bp = {"type": "bitmap",
                "file": "file",
                "chars": [f"u{hex(57392)[2:].upper()}"]
                }
with open(dir, 'w') as f:
        json.dump(bp, f, indent = 4)

And i try to get:

{
"type": "bitmap",
"file": "file:,
"chars": ["uE030"]
}

I only want to generate the JSON, im not loading it afterwards.

Is there any way to save the string in the format I need?

Answers

- chepner
- April 13, 2023 at 4:11 am
- 0 votes
0
Use chr to get the correct Unicode character, instead of trying to construct a Python escape sequence dynamically.
```
bp = {"type": "bitmap",
      "file": "file",
      "chars": [chr(57392)]
     }
```
By default, json.dump will encode non-ASCII strings using u.... escapes, rather than UTF-8 characters.
Login or Signup to reply.

- KarlKnechtel
- April 13, 2023 at 4:18 am
- 0 votes
0
The json standard library automatically escapes the data in strings that are passed to it, when it writes the file with json.dump (or creates the string with json.dumps) – it has to, because the JSON file format requires that. It is not valid JSON, for example, if the file says something like {"bad": ""}. So if the input dictionary is {'bad': '\'} (i.e., the value contains one backslash, which is represented as two in the source code), then the output JSON will show two backslashes in the actual file.

Similarly with Unicode escapes. If the file should contain "ue030", then the input string should contain the actual character represented by this escape sequence – for example, the source code could contain a string literal with that actual text (''), or the source code could itself use an escape sequence ('ue030').

Uppercase vs. lowercase does not matter; it means the same thing when the file is read by a proper JSON tool. In fact, a valid JSON file can contain either the character or the escape sequence, too. It means the same thing.

By default, json will already choose to use the escape sequence when it writes out the data. (To make it put Unicode text in the file instead, pass ensure_ascii=False.)

It is not possible to "construct" an escape sequence by "joining" for example 'u' with 'e030', because 'u' is invalid on its own. Interpreting the escape sequences in string literals in the Python code, happens before any operations like joining strings together.

To turn a number into the corresponding Unicode code point, use chr.

Let’s demonstrate at the REPL:
```
>>> import json
>>> bp = {"type": "bitmap",
...                 "file": "file",
...                 "chars": [chr(57392)]
...                 }
>>> 
>>> print(json.dumps(bp, indent=4))
{
    "type": "bitmap",
    "file": "file",
    "chars": [
        "ue030"
    ]
}
```
Again: we do not get to control the e being lowercase, and we do not need to. It means the same thing when the file is read.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.