skip to Main Content

There’s a specific website I want to get the source code from with PHP cURL.

Visiting this website with a bowser from my computer works without any problems.

But when I want to access this website with my PHP script, the website recognizes that this is an automated request and shows an error message.

This is my PHP script:

<?php
$url = "https://www.example.com";
$user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.1 Safari/605.1.15";
$header = array('http' => array('user_agent' => $user_agent));

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);
echo $data;
?>

The user agent is the same I’m also using with the browser. I’m using a local server with MAMP PRO. This means I’m using the same IP address for both, browser access and PHP script access.

I already tried my PHP script with many different headers and options but nothing worked.

There must be anything that makes a PHP script access look different than a browser access, for the web server I want so access the website from. But what? Do you have an idea?

EDIT

I found out that it’s working with this cURL:

curl 'https://www.example.com/' -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3' -H 'accept-language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7'

If I type this in e.g. the Terminal, it’s showing the correct source code.

I converted it to a PHP script as follows:

<?php
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'https://www.example.com/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

$headers = array();
$headers[] = 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36';
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3';
$headers[] = 'Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$result = curl_exec($ch);
curl_close($ch);
echo $result;
?>

Unfortunately, this way it’s still showing the error message.

This means, there must be anything that makes a command line access look different than a browser access, for the web server I want so access the website from. But what is it?

2

Answers


  1. You should try to mimic a real browser by forging “real” http request. Add more headers than the User-Agent, like “Accept”, “Accept-Language”, “Accept-Encoding”. Also, you probably need to accept (and handle correctly) cookies.
    If your targeted website use javascript to detect a real browser, this is an other challenge.

    Login or Signup to reply.
  2. There is no difference between a cURL request and the request that a browser makes, apart from the HTTP headers it requests, and that a browser has JavaScript running on the client.

    The only thing that identifies an HTTP client is its headers — typically the user agent string — and seeing as you have set the user agent to exactly the same as the browser, there must be other checks in place.

    By default, cURL doesn’t send any default Accept header, whereas browsers request pages with this header to show the capabilities of the browser. I expect the web server will be checking on something like this.

    Copy HTTP request as cURL

    Take a look at the screenshot above of Chrome Developer Tools. It allows you to copy the whole request as a cURL request, including all the headers that were sent from Chrome, for testing in the terminal.

    Try to match all the headers exactly from within your PHP, and I’m sure the web server will not be able to identify you as a script.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search