skip to Main Content

I’m trying to decode a text that contains extended ASCII characters but when I try to convert the character I get the wrong value. Like this:

    echo "“<br>";
    echo ord("“")."<br>";
    echo chr(ord("“"))."<br>";

And this is my output:

“
226
�

The ASCII value of the character "“" is 147, not 226. And instead of the � symbol, I want to get "“" character back.

I’m using UTF-8

<meta charset="utf-8">

I have tried changing to different charsets but it didn’t work.

3

Answers


  1. You’re incorrect about the character, the UTF-8 encoding is two bytes: c293.

    See: SET TRANSMIT STATE.

    In the manual for ord() it says:

    However, note that this function is not aware of any string encoding,
    and in particular will never identify a Unicode code point in a
    multi-byte encoding such as UTF-8 or UTF-16.

    On top of this, if I actually convert the '“' charachter to hexadecimal, I get: e2809c. So it’s a triplet. Never trust what you read online. 😏

    See: https://3v4l.org/57UV8

    Login or Signup to reply.
  2. 1st U+201C Left Double Quotation Mark is UTF-8 byte sequence E2 80 9C (hexadecimal) i.e. decimal 226 128 156

    2nd ordConvert the first byte of a string to a value between 0 and 255

    Result: ord("“") returns 226

    Instead of ord and chr pair, use mb_ord and its complement mb_chr, e.g. as follows:

    <?php
    echo "“<br>";
    echo mb_ord("“")."<br>";
    echo mb_chr(mb_ord("“"))."<br>";
    ?>
    

    Result: .SO74045685.php


    8220

    Edit you can get Windows-1251 code (147) for character (U+201C, Left Double Quotation Mark) as follows:

    echo ord(mb_convert_encoding("“","Windows-1251","UTF-8"));  //147
    
    Login or Signup to reply.
  3. There is no ASCII representation for “, as has already been said it is multibyte, UTF-8 to be precise:

    echo mb_detect_encoding("“"); // UTF-8
    

    ord() and chr() don’t support this, you’re only looking at the first byte of up to four needed for a particular character. Fortunately there are functions that does:

    echo "“n"; // “
    echo mb_ord("“")."n"; // 8220
    echo mb_chr(mb_ord("“")); // “
    

    But why do you need to transform it back and forth? It seems you already have the character in your code :), not as a value but as the actual visual representation.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search