skip to Main Content

I want to use curl to scrape multiple pages of an online shop. The problem that i have is that the urls are seo friendly – or something like that – and they look like this:

https://shopname.com/product-id-title-of-a-product.html

If i use the entire url it works and i’m able to get the data that i’m looking for but the only variable in that title that i know is the ID:

https://shopname.com/product-294

Is there a way to scrape that url in this case?

The url that only has the ID in it does REDIRECT to the full url.

And this is the code that i’m using:

$curl = curl_init();
$url = 'https://shopname.com/product-294';

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

$result = curl_exec($curl);

2

Answers


  1. I think you need to capture the response headers in the curl object, that should contain the redirect url within them, and then you can parse that out and do a second curl request to get the url you are after.
    Try using an app like postman or insomnia to assist you in this process.

    Login or Signup to reply.
  2. Curl provides the option CURLOPT_FOLLOWLOCATION.

    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    

    The documentation states:

    TRUE to follow any “Location: ” header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many “Location: ” headers that it is sent, unless CURLOPT_MAXREDIRS is set).

    Therefore it would be advisable to set CURLOPT_MAXREDIRS aswell, for example to limit the execution to 1 redirection:

    curl_setopt($curl, CURLOPT_MAXREDIRS, 1);
    

    Like this you should be automatically be redirected to the original url without any further programming.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search