I want to use curl to scrape multiple pages of an online shop. The problem that i have is that the urls are seo friendly – or something like that – and they look like this:
https://shopname.com/product-id-title-of-a-product.html
If i use the entire url it works and i’m able to get the data that i’m looking for but the only variable in that title that i know is the ID
:
https://shopname.com/product-294
Is there a way to scrape that url in this case?
The url that only has the ID
in it does REDIRECT
to the full url.
And this is the code that i’m using:
$curl = curl_init();
$url = 'https://shopname.com/product-294';
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($curl);
2
Answers
I think you need to capture the response headers in the curl object, that should contain the redirect url within them, and then you can parse that out and do a second curl request to get the url you are after.
Try using an app like postman or insomnia to assist you in this process.
Curl provides the option
CURLOPT_FOLLOWLOCATION
.The documentation states:
Therefore it would be advisable to set
CURLOPT_MAXREDIRS
aswell, for example to limit the execution to 1 redirection:Like this you should be automatically be redirected to the original url without any further programming.