PHP function file_get_contents($url) returns special characters

IbrahimELSanosi
June 2, 2023
126 views
0 votes
2 Answers

I am trying to retrieve the meta data for a given links (url). I have implemented the following steps:

$url = "url is here";
$html = file_get_contents($url);
$crawler = new Crawler($html); // Symfony library
$description = $crawler->filterXPath("//meta[@name='description']")->extract(['content']);

Doing so, I manage to retrieve the meta data for some urls but not for all.
Some urls, the file_get_contents($url) function returns special characters like (x1F‹x08x00x00x00x00x00x04x03ì½}{ãÆ‘/ú÷øSÀœ’x1E)! ‘z§¬qlÇI……….) that is why I could not retrieve the meta data.

Notice that, I am using the same website for $url values but passing different slugs (different blog urls like https://www.example.com/blog-1).

Attempts:

I used these functions mb_convert_encoding and mb_detect_encoding
I made sure all urls I have passed are accessible through the browser.

Any thought, why I am getting special characters when I am calling file_get_contents function, and some time getting correct html format?

Answers

Chosen as BEST ANSWER
- IbrahimELSanosi
- June 2, 2023 at 4:06 pm
- 0 votes
0
I have solved the issue by adding the following parameters to file_get_contents functions:
```
private const EMBED_URL_APPEND = '?tab=3&object=%s&type=subgroup';
      
private const EMBED_URL_ENCODE= 'CM_949A11_1534_1603_DAG_DST_50_ÖVRIGT_1_1';
            
$urlEncoded= sprintf($url.self::EMBED_URL_APPEND, rawurlencode(self::EMBED_URL_ENCODE));
            
$html =  file_get_contents($urlEncoded);
```

(Edit)

- CarloPokker
- June 2, 2023 at 11:36 am
- 0 votes
0
The output contains a mix of printable characters and escape sequences like "x08" or "x1E", which represent control characters or non-printable characters in ASCII or other character encoding schemes. The "x1F" at the beginning suggests that the data might be in a compressed or encoded format.

Try to decode $html variable with base64 or another way.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.