skip to Main Content

I’m querying the Facebook API in PHP to get a list of posts and display it on a website.

// $facebook is an instance of FacebookFacebook
$response = $facebook->get('posts?fields=id,message,created_time,full_picture,permalink_url,status_type&limit=20');
$graphEdge = $response->getGraphEdge();
$posts = [];

foreach ($graphEdge as $post) {
    $message = $post->getField('message');
}

The text returned by the call looks like the picture below:

enter image description here

My problem is that sometimes the formatting of the text seems to be embedded in the characters themselves. For eg., the text "Montรฉlimar – aux Portes du Soleil" uses a different font than what’s defined in CSS and I can’t force it to use a different style. The HTML looks like this:

<p>
  Profitez dโ€™un cadre de vie idรฉal pour faire construire votre maison individuelle sur la commune de ๐Œ๐จ๐ง๐ญ๐žฬ๐ฅ๐ข๐ฆ๐š๐ซ - ๐š๐ฎ๐ฑ ๐๐จ๐ซ๐ญ๐ž๐ฌ ๐๐ฎ ๐’๐จ๐ฅ๐ž๐ข๐ฅ โ˜€๏ธ
  Notre lotissement ยซ ๐‹๐ž ๐ƒ๐จ๐ฆ๐š๐ข๐ง๐ž ๐๐ž ๐†๐žฬ๐ซ๐ฒ ยป ...
</p>

We even store the data in a JSON object and it looks like this (see the "description" field):

[
    {
        "pageName": "---",
        "type": "---",
        "date": "---",
        "description": "Profitez dโ€™un cadre de vie idรฉal pour faire construire votre maison individuelle sur la commune de ๐Œ๐จ๐ง๐ญ๐žฬ๐ฅ๐ข๐ฆ๐š๐ซ - ๐š๐ฎ๐ฑ ๐๐จ๐ซ๐ญ๐ž๐ฌ ๐๐ฎ ๐’๐จ๐ฅ๐ž๐ข๐ฅ โ˜€๏ธ Notre lotissement ยซ ๐‹๐ž ๐ƒ๐จ๐ฆ๐š๐ข๐ง๐ž ๐๐ž ๐†๐žฬ๐ซ๐ฒ ยป ...",
        "time": 0000,
        "thumbnail": "---",
        "url": "---",
        "img": "---"
    }
]

As you can see, some text has a default styling that I can’t figure how to get rid of. I’ve tried to re-encode the text to UTF-8 via PHP using mb_convert_encoding(); but this doesn’t solve the problem because the string is already UTF-8.

How can I remove this formatting? Is this even formatting, or just special UTF-8 symbols?

2

Answers


  1. Chosen as BEST ANSWER

    If you copy one of the characters (the "M" of "Montรฉlimar" for eg.) and try to look for it in the Unicode Character Table (https://unicode-table.com/en/1D40C/), you will find that it is not a letter but a "Mathematical Bold Capital M", represented by these symbols:

    • Unicode number: U+1D40C
    • HTML-code: &#119820;

    So this is a problem with your content itself and not an encoding problem. Everything is fine and I don't think you can anything do to fix this appearance issue.


  2. If the UTF-8 special characters get in the way, you can try converting the string to ASCII with iconv. However, there is a risk that the individual characters and, under certain circumstances, important information will be lost.

    $strUTF8mb4 = "Profitez dโ€™un cadre de vie idรฉal pour faire construire votre maison individuelle sur la commune de ๐Œ๐จ๐ง๐ญ๐žฬ๐ฅ๐ข๐ฆ๐š๐ซ - ๐š๐ฎ๐ฑ ๐๐จ๐ซ๐ญ๐ž๐ฌ ๐๐ฎ ๐’๐จ๐ฅ๐ž๐ข๐ฅ โ˜€๏ธ Notre lotissement ยซ ๐‹๐ž ๐ƒ๐จ๐ฆ๐š๐ข๐ง๐ž ๐๐ž ๐†๐žฬ๐ซ๐ฒ ยป ...";
    $strASCII = iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $strUTF8mb4);
    //string(181) "Profitez d'un cadre de vie id'eal pour faire construire votre maison individuelle sur la commune de Montelimar - aux Portes du Soleil Notre lotissement << Le Domaine de Gery >> ..."
    

    Especially for the French language, this code could produce slightly better results:

    $strIso = iconv("UTF-8", "ISO-8859-15//TRANSLIT//IGNORE", $strUTF8mb4);
    $strUtf8 = iconv("ISO-8859-15", "UTF-8", $strIso);
    //"Profitez d'un cadre de vie idรฉal pour faire construire votre maison individuelle sur la commune de Montelimar - aux Portes du Soleil Notre lotissement ยซ Le Domaine de Gery ยป ..."
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search