skip to Main Content

I am writing an audit for our website to see what detail pages are displayed based on a selection. The website is written and maintained by a 3rd party. The issue is when I pull the file such as:

$page = file_get_contants(‘https://example.com/search=red’);

It returns source which is the pre generated source which is a template and then uses ajax calls to get the relevant data and then builds the final page from that and the generated code is what I would like to parse to get the links to the detail pages.

I am thinking something like wkhtmltopdf only which would do it to generated source. Does anyone know of a php library which can do this or a linux package that can do this which a I could call from php?

2

Answers


  1. To handle dynamic content loading, you might want to consider using a headless browser or a tool designed for web scraping with JavaScript rendering capabilities. Puppeteer is one such tool, and it’s commonly used with Node.js for tasks like this. However, you can use a PHP library that wraps around a headless browser as well.

    One such PHP library is Goutte, which uses Symfony components and provides a simple interface for web scraping. It’s built on top of Guzzle and Symfony BrowserKit. Here’s a basic example of how you might use Goutte for your task:

    <?php
    
    require_once 'vendor/autoload.php';
    
    use GoutteClient;
    
    $client = new Client();
    
    $crawler = $client->request('GET', 'https://yoursite.com/search=red');
    
    $links = $crawler->filter('a')->links();
    
    foreach ($links as $link) {
        echo $link->getUri() . "n";
    }
    

    Remember to install Goutte and its dependencies using Composer:

    composer require fabpot/goutte
    
    Login or Signup to reply.
  2. I don’t understand quite well what are you trying to accomplish, i guess you are trying to do some web scraping to get the content of various pages?

    Nonetheless, here are some packages that will help you with this:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search