I’m working on a script that builds an XML feed using strings from the database. The strings are user-entered image captions from Facebook Open Graph API. The strings are supposed to be all UTF8 according to facebook. So i import the captions into the database and store them as utf8-unicode (i also tried utf8-bin)
But i always have the same error when trying to display the output XML feed, because one of the caption have a weird whitespace character
This page contains the following errors:
error on line 63466 at column 14: Input is not proper UTF-8, indicate encoding !
Bytes: 0x0B 0x54 0x68 0x6F
Below is a rendering of the page up to the first error.
In the database (phpmyadmin) and in the page source code (using chrome), the problematic characters appear as empty square symbol.
Now if i copy and paste the problematic character in an converter it gives me Hexadecimal 000B
What’s the easiest way to fix this ?
I’d also like to understand in the first place, why Facebook Graph API is giving me non-utf8 characters when it’s not supposed to
Failed attemps:
- utf8_encode() isn’t working because the rest of the strings are UTF8 valid.
- I also tried multiple different ways of stripping out all non-utf8 characters, but it doesn’t filter out this specific character. Same when trying to filter out all non-latin.
- htmlentities() htmlspecialchars() or the same isn’t encoding the problematic characters
- charactericonv(mb_detect_encoding()) will not detect the string as invalid utf8
- str_replace() or preg_replace() is of no help, if i try to copy and paste the character in Visual Studio Code, nothing is pasted, not even a whitespace
- str_replace(“