skip to Main Content

As the subject says, I am having trouble making PHP file_get_contents() calls between domains that share an IP. This was working for quite some time but in the past couple of years it started failing.

When the problem occurred I would notify my support contact (they contact the hosting provider) and a while later the problem would "go away". But only for a while. Sometimes it would be months between occurrences, sometimes just a couple of weeks or days. And sometimes the problem would go away on its own without contacting support.

Unfortunately I don’t have access to the server logs that may help me because the domain accounts are in a "jailed shell". I can access the PHP error logs and those will only tell me a connection has timed out with failed to open stream: Connection timed out.

I did research the problem. But came up with nothing that pointed to the cause(even if it was me!). The closest I’ve came is finding some possible connection between a firewall and mishandling of IPV4/IPV6.

Then I tried to characterize the problem to see exactly what was and wasn’t failing. That effort can be seen here.

I would really like to understand what the cause is and find a work around or a proper fix if possible.

2

Answers


  1. When different hostnames share the same IP address, the apache web server uses a scheme called virtual hosts to direct each incoming request to the appropriate document root (the directory where your .php files live for each different hostname).

    If that’s misconfigured somehow, incoming requests to different host names can get directed to the wrong document root, or even to an invalid document root.

    Virtual hosts require some server-ops skills to maintain. If your hosting service won’t let you see your error log, they almost certainly won’t let you see the configuration files at /etc/apache2/conf/sites-enabled/.

    Mine look something like this.

    File /etc/apache2/conf/sites-enabled/qa1.lan.example.com.conf. See how the hostname and the document root are both mentioned?

    <VirtualHost *:80>
        ServerName qa1.lan.example.com
        ServerAlias www.qa1.lan.example.com
        ServerAdmin [email protected]
        DocumentRoot /var/www/qa1.lan.example.com
        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined
        <Directory /var/www/qa1.lan.plumislandmedia.net>
          Options Indexes FollowSymLinks
          AllowOverride All
        </Directory>
    </VirtualHost>
    

    If I had another hostname, for example qa2.lan.example.com, it would have its own configuration file. (I do have such a hostname.)

    If this was happening to me I’d consider firing my hosting provider. There are plenty of good ones out there.

    Login or Signup to reply.
  2. Okay. When I have an issue file_get_contents() when making http requests.
    What I do is the use curl.
    file_get_contents has configurable options. And it has default options. If you look at the PHP manual.

    <?php
    // Create a stream
    $opts = array(
      'http'=>array(
        'method'=>"GET",
        'header'=>"Accept-language: enrn" .
                  "Cookie: foo=barrn"
      )
    );
    
    $context = stream_context_create($opts);
    
    // Open the file using the HTTP headers set above
    $file = file_get_contents('http://www.example.com/', false, $context);
    ?>
    

    I do not know you PHP server environment and do not need to.

    You could learn about stream contex but that takes a lot of thinking. It is likely the file_get_contents context is the source of your problem. And that makes sense given the frequency of your issues.

    I suggest you use curl. I is not any more difficulty than file_get_contents if someone really familiar with curl were to share with you their basic curl configuration that almost never fails them for most URL. When you try to get into sites that does not want anyone using curl then I need to make adjustments. My stance is, if my Browser can get there, so can my curl.

    Don’t tell anyone about this but here is the curl configuration that works in 99.99% links. <br.<br.
    I just used this last weekend to scrape real estate listing from Zillow.com.
    So I know it still works well for almost every URL.

        $request = array();
        $request[] = "Host: www.zillow.com";
        $request[] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,* / *;q=0.8";
        $request[] = "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:104.0) Gecko/20100101 Firefox/104.0";
        $request[] = "Accept-Language: en-US,en;q=0.5";
        $request[] = "Connection: keep-alive";
        $request[] = "Cache-Control: no-cache";
        $request[] = "Pragma: no-cache";
    foreach($pages as $page => $url){
    //echo "$urlnnn";
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_POST, false);
        curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
        curl_setopt($ch, CURLOPT_ENCODING,"");
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
        curl_setopt($ch, CURLOPT_TIMEOUT,10);
        curl_setopt($ch, CURLOPT_FAILONERROR,true);
        curl_setopt($ch, CURLOPT_ENCODING,"");
        curl_setopt($ch, CURLOPT_VERBOSE, true);
        curl_setopt($ch, CURLINFO_HEADER_OUT, true);
        curl_setopt($ch, CURLOPT_HEADER, true);
        $data = curl_exec($ch);
        var_export(curl_getinfo($ch));
        echo  curl_error($ch);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search