I’m playing with Telegram bot development.
The only thing in which i have no success is sending unicode characters.
The way i call the “sendMessage” api is in php with curl:
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, array("chat_id" => $chat_id, "text" => "u2b50"));
The code above should post a star icon on the chat, but instead shows the exact text:
u2b50
- Escaping the text (“\u2b50”) doesn’t work.
- If the bot acts as an echo (replies with the received text) when typing “u2b50” in the client, it replies with the star icon.
- same behavior has for the keyboard keys (reply_markup.keyboard)
Thanks in advance
EDIT:
solved with solution from bobince (thanks!).
used inline function like:
$text = preg_replace_callback('/\\u([0-9a-fA-F]{4})/', function ($match) {
return iconv('UCS-4LE', 'UTF-8', pack('V', hexdec($match[1])));
}, $text);
or
$text = preg_replace("/\\u([0-9a-fA-F]{4})/e", "iconv('UCS-4LE','UTF-8',pack('V', hexdec('U$1')))", $text);
2
Answers
set the charset to unicode…
PHP string literal syntax doesn’t have
u
escapes, primarily because PHP strings are not Unicode-based, they’re just a list of bytes.Consequently if you want to include a non-ASCII character in a string you need to encode the character to bytes using whatever encoding the consumer of your output will be expecting.
If the Telegram web service is expecting to receive UTF-8 (and I’ve no idea if it is, but it’s a good guess for any modern web app), then the UTF-8-encoded bytes for U+2B50 are 0xE2, 0xAD and 0x90, and so the string literal you should use is:
If you want to convert a Unicode codepoint to a UTF-8 string more generally: