PHP get generated source of a file gotten using file_get_contents or curl

user999684
November 24, 2023
293 views
1 vote
2 Answers

I am writing an audit for our website to see what detail pages are displayed based on a selection. The website is written and maintained by a 3rd party. The issue is when I pull the file such as:

$page = file_get_contants(‘https://example.com/search=red’);

It returns source which is the pre generated source which is a template and then uses ajax calls to get the relevant data and then builds the final page from that and the generated code is what I would like to parse to get the links to the detail pages.

I am thinking something like wkhtmltopdf only which would do it to generated source. Does anyone know of a php library which can do this or a linux package that can do this which a I could call from php?

Tags: php

Answers

- MajidGh
- November 24, 2023 at 6:22 pm
- 0 votes
0
To handle dynamic content loading, you might want to consider using a headless browser or a tool designed for web scraping with JavaScript rendering capabilities. Puppeteer is one such tool, and it’s commonly used with Node.js for tasks like this. However, you can use a PHP library that wraps around a headless browser as well.

One such PHP library is Goutte, which uses Symfony components and provides a simple interface for web scraping. It’s built on top of Guzzle and Symfony BrowserKit. Here’s a basic example of how you might use Goutte for your task:
```
<?php

require_once 'vendor/autoload.php';

use GoutteClient;

$client = new Client();

$crawler = $client->request('GET', 'https://yoursite.com/search=red');

$links = $crawler->filter('a')->links();

foreach ($links as $link) {
    echo $link->getUri() . "n";
}
```
Remember to install Goutte and its dependencies using Composer:
```
composer require fabpot/goutte
```
Login or Signup to reply.

- AlexMel233ndez
- November 24, 2023 at 6:52 pm
- 0 votes
0
I don’t understand quite well what are you trying to accomplish, i guess you are trying to do some web scraping to get the content of various pages?

Nonetheless, here are some packages that will help you with this:
- Guzzle PHP
- Panther
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.