I am developing an application which crawls data from another website. That website is protected by a login, but I have an account there. My application should login to that website and return the content of the protected web page. I managed to get this to work in Python using the requests package.
Now I want to accomplish the same thing in PHP using cURL. Unfortunately, until this moment, I couldn’t make this work, and I would like your help.
Before you can login, the website requires a verification token. So, you first have to obtain the Token, and then login afterwards. Here is my (working!) Python code:
import requests
url = "https://www.mywebsite.com/login.php"
headers = {'user-agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36"}
s = requests.Session()
// Get token
r1 = s.get(url, headers = headers)
cacheToken = ExtractTokenFromText(r1.text) // some function defined by me
// Login
data = {'username': 'myusername',
'password': 'mypassword',
'__RequestVerificationToken': cacheToken}
r2 = s.post(url, headers = headers, data = data)
my_content = r2.text
Now I try to implement the same functionality in PHP using the cURL library. My PHP code is:
$url = "https://www.mywebsite.com/login.php";
$ch = curl_init();
// Get token
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADERFUNCTION, "curlResponseHeaderCallback");
$r1 = curl_exec($ch);
function curlResponseHeaderCallback($ch, $headerLine) {
global $cache_token;
$cache_token = ExtractTokenFromHeader($headerline); // some function defined by me
return strlen($headerLine); // Needed by curl
}
// Login
$post_data = array('username' => $myusername,
'password' => $mypassword,
'__RequestVerificationToken' => $cache_token);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($post_data));
$r2 = curl_exec($ch);
$my_content = $r2;
The PHP file correctly receives the $cache_token, so the GET request is executed correctly. Unfortunately, the POST request is not working, because the PHP file gives the following error message:
"Your antiforgery token is invalid." with a HTTP 400 Bad Request.
I tried many things to fix the problem, but none of them work:
- Adding a user agent curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36"); - Adding curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); to the second request.
- Adding curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiefile); with one of the following three options:
$cookiefile = getcwd() . '/cookie.txt';
$cookiefile = __DIR__ . '/cookie.txt';
$cookiefile = 'cookie.txt';
- Adding curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiefile); (as above)
- Adding curl_setopt($ch,CURLOPT_FOLLOWLOCATION,TRUE);
- Enable error reporting to find the problem
- Using their API directly (it only works if you pay them)
I would like to stress that the Python version works, but the PHP version doesn’t. (So I’m sure there are no missing or hidden parameters, captcha’s to handle with, etc.)
My question is similar to this and this question, and in a lesser way to this and this question, but their solutions either don’t work for me and there are no answers at all…
2
Answers
So, thanks to @Steven Penny and this great YouTube video I finally managed to get it working. Key differences:
My final script:
So it will depend on the site of course, but I think I had a similar situation with GitHub. I have a GitHub login, and I wanted to programmatically access some info that wasn’t available with the API. To login with GitHub:
Then after that, you can make GET requests with
CURLOPT_COOKIEFILE
set togithub.txt
.