I’m querying the Facebook API in PHP to get a list of posts and display it on a website.
// $facebook is an instance of FacebookFacebook
$response = $facebook->get('posts?fields=id,message,created_time,full_picture,permalink_url,status_type&limit=20');
$graphEdge = $response->getGraphEdge();
$posts = [];
foreach ($graphEdge as $post) {
$message = $post->getField('message');
}
The text returned by the call looks like the picture below:
My problem is that sometimes the formatting of the text seems to be embedded in the characters themselves. For eg., the text "Montรฉlimar – aux Portes du Soleil" uses a different font than what’s defined in CSS and I can’t force it to use a different style. The HTML looks like this:
<p>
Profitez dโun cadre de vie idรฉal pour faire construire votre maison individuelle sur la commune de ๐๐จ๐ง๐ญ๐ฬ๐ฅ๐ข๐ฆ๐๐ซ - ๐๐ฎ๐ฑ ๐๐จ๐ซ๐ญ๐๐ฌ ๐๐ฎ ๐๐จ๐ฅ๐๐ข๐ฅ โ๏ธ
Notre lotissement ยซ ๐๐ ๐๐จ๐ฆ๐๐ข๐ง๐ ๐๐ ๐๐ฬ๐ซ๐ฒ ยป ...
</p>
We even store the data in a JSON object and it looks like this (see the "description" field):
[
{
"pageName": "---",
"type": "---",
"date": "---",
"description": "Profitez dโun cadre de vie idรฉal pour faire construire votre maison individuelle sur la commune de ๐๐จ๐ง๐ญ๐ฬ๐ฅ๐ข๐ฆ๐๐ซ - ๐๐ฎ๐ฑ ๐๐จ๐ซ๐ญ๐๐ฌ ๐๐ฎ ๐๐จ๐ฅ๐๐ข๐ฅ โ๏ธ Notre lotissement ยซ ๐๐ ๐๐จ๐ฆ๐๐ข๐ง๐ ๐๐ ๐๐ฬ๐ซ๐ฒ ยป ...",
"time": 0000,
"thumbnail": "---",
"url": "---",
"img": "---"
}
]
As you can see, some text has a default styling that I can’t figure how to get rid of. I’ve tried to re-encode the text to UTF-8 via PHP using mb_convert_encoding();
but this doesn’t solve the problem because the string is already UTF-8.
How can I remove this formatting? Is this even formatting, or just special UTF-8 symbols?
2
Answers
If you copy one of the characters (the "M" of "Montรฉlimar" for eg.) and try to look for it in the Unicode Character Table (https://unicode-table.com/en/1D40C/), you will find that it is not a letter but a "Mathematical Bold Capital M", represented by these symbols:
U+1D40C
𝐌
So this is a problem with your content itself and not an encoding problem. Everything is fine and I don't think you can anything do to fix this appearance issue.
If the UTF-8 special characters get in the way, you can try converting the string to ASCII with iconv. However, there is a risk that the individual characters and, under certain circumstances, important information will be lost.
Especially for the French language, this code could produce slightly better results: