skip to Main Content

Consider the following text:

sample_text = "The fox's color was u201Cbrownu201D and it’s speed was quick"

Notice that there is a regular single quote in "fox’s" and a right single quote in "it’s"

So my purpose is to get the original text representation of those encoded characters in sample_text, but not able to do so completely.

I did the following:

>>> sample_text.encode().decode('unicode-escape')
"The fox's color was "brown" and itâx80x99s speed was quick"

Now my question is, is there any way I could get the original right single quote after decoding that sample_text . With my code’s output, you can see that it’s giving me itâx80x99s instead.
I want it to be: it’s

Edit: As suggested in the comments, I’m adding the output of print(sample_text)

print(sample_text)
output: The fox's color was u201Cbrownu201D and it’s speed was quick

Edit: I’m using python 3.8.10 and Ubuntu

2

Answers


  1. According to your post and your edits this should work for you:

    >>> text_part_1 = "The fox's color was "
    >>> text_part_2 = " and it’s speed was quick"
    >>> color = "u201Cbrownu201D"
    >>> color = color.encode().decode('unicode-escape')
    >>> print(f'{text_part_1}{color}{text_part_2}')
    

    To avoid confusion, I have to add that this is not working for me, but it’s giving me this:

    >>> print(f'{text_part_1}{color}{text_part_2}')
    The fox's color was âbrownâ and it’s speed was quick
    

    (I’m using python 3.10.6 in Ubuntu 22.04.2 in WSL2 right now)

    But since the color was output correctly in your code sample

    >>> sample_text.encode().decode('unicode-escape')
    "The fox's color was "brown" and itâx80x99s speed was quick"
    

    it should work for you.

    Login or Signup to reply.
  2. Read about unicode-escape in Python Specific Encodings (my emphasizing):

    Encoding suitable as the contents of a Unicode literal in
    ASCII-encoded Python source code, except that quotes are not escaped.
    Decode from Latin-1 source code. Beware that Python source code actually uses UTF-8 by default.

    Hence, .encode().decode('unicode_escape') causes a mojibake case as follows:

    'it’s'.encode()                            # b'itxe2x80x99s'
    'it’s'.encode().decode('unicode_escape')   #  'itâx80x99s'
    'it’s'.encode().decode('latin-1')          #  'itâx80x99s'
    'it’s'.encode().decode('unicode_escape') == 'it’s'.encode().decode('latin-1')
     #                                         # True
    

    Solution in the following code; :

    sample_text = "The fox's color was u201Cbrownu201D and it’s speed was quick"
    print(sample_text)    # regular python text
    sample_text =r"The fox's color was u201Cbrownu201D and it’s speed was quick"
    print(sample_text)    # raw python text
    print(sample_text.encode( 'raw_unicode_escape').decode( 'unicode_escape'))
    

    Linux:

    ~$ python3
    
    Python 3.8.10 (default, Nov 14 2022, 12:59:47)
    [GCC 9.4.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    
    >>> sample_text = "The fox's color was u201Cbrownu201D and it’s speed was quick"
    >>> print(sample_text)
    
    The fox's color was “brown” and it’s speed was quick
    
    >>> sample_text =r"The fox's color was u201Cbrownu201D and it’s speed was quick"
    >>> print(sample_text)
    

    The fox’s color was u201Cbrownu201D and it’s speed was quick

    >>> print(sample_text.encode( 'raw_unicode_escape').decode( 'unicode_escape'))
    
    The fox's color was “brown” and it’s speed was quick
    
    >>>
    

    Windows:

    Python 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]
    IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.
    
    In [1]: sample_text = "The fox's color was u201Cbrownu201D and it’s speed was quick"
       ...: print(sample_text)
       ...: sample_text =r"The fox's color was u201Cbrownu201D and it’s speed was quick"
       ...: print(sample_text)
       ...: print(sample_text.encode( 'raw_unicode_escape').decode( 'unicode_escape'))
       ...:
    
    The fox's color was “brown” and it’s speed was quick
    The fox's color was u201Cbrownu201D and it’s speed was quick
    The fox's color was “brown” and it’s speed was quick
    
    In [2]:
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search