skip to Main Content

I’m playing with Telegram bot development.
The only thing in which i have no success is sending unicode characters.

The way i call the “sendMessage” api is in php with curl:

curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, array("chat_id" => $chat_id, "text" => "u2b50"));

The code above should post a star icon on the chat, but instead shows the exact text:

u2b50

  • Escaping the text (“\u2b50”) doesn’t work.
  • If the bot acts as an echo (replies with the received text) when typing “u2b50” in the client, it replies with the star icon.
  • same behavior has for the keyboard keys (reply_markup.keyboard)

Thanks in advance

EDIT:
solved with solution from bobince (thanks!).

used inline function like:

$text = preg_replace_callback('/\\u([0-9a-fA-F]{4})/', function ($match) {
    return iconv('UCS-4LE', 'UTF-8', pack('V', hexdec($match[1])));
}, $text);

or

$text = preg_replace("/\\u([0-9a-fA-F]{4})/e", "iconv('UCS-4LE','UTF-8',pack('V', hexdec('U$1')))", $text);

2

Answers


  1. set the charset to unicode…

    $headers = array(
               "Content-Type: application/x-www-form-urlencoded; charset: UTF-8"
            );
    curl_setopt($ch, CURLOPT_POST, $headers );
    curl_setopt($ch, CURLOPT_HEADER, array("chat_id" => $chat_id, "text" => "u2b50"));
    
    Login or Signup to reply.
  2. “u2b50”

    PHP string literal syntax doesn’t have u escapes, primarily because PHP strings are not Unicode-based, they’re just a list of bytes.

    Consequently if you want to include a non-ASCII character in a string you need to encode the character to bytes using whatever encoding the consumer of your output will be expecting.

    If the Telegram web service is expecting to receive UTF-8 (and I’ve no idea if it is, but it’s a good guess for any modern web app), then the UTF-8-encoded bytes for U+2B50 are 0xE2, 0xAD and 0x90, and so the string literal you should use is:

    "xE2xADx90"
    

    If you want to convert a Unicode codepoint to a UTF-8 string more generally:

    function unichr($i) {
        return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
    }
    
    unichr(0x2B50)   // "xE2xADx90"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search