Php - How do i scrape multiple lines in the sourcelist using cURL and preg_match_all

Ko1ind
August 24, 2023
180 views
0 votes
2 Answers

Hey im trying to webscrape a specific thing on a website, like this

  <td><a href="javascript:void(0)" class="rankRow"
                                                                           data-rankkey="25">
                                                                                    Averages
                                                                            </a>
                                                                    </td>
                                                                    <td class="page_speed_602217763">
                                                                            82.84                                                                        </td>
                                                            </tr>

Where im trying to get the number 82,84 with the page_speed_** number variying and the on constant that differentiate from the rest of the sourcelist being the text "Averages"

I have tried using the preg_match_all but cant seem to search more than one line and whatevers in between.

My code i have used is the following

<?php
        $curl = curl_init();
      curl_setopt($curl, CURLOPT_URL, $Player1Link);
      curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
      curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
      $curlresult = curl_exec($curl);
      preg_match_all('!data-rankkey="25">Averages</a></td><td class="(d.*)</tr>!', $curlresult, $matches);
      print_r($matches);
    
      $P1AvgHigh = $matches[0][3];
      echo "<br>";
      echo $P1AvgHigh;
      curl_close($curl);
?>

Thanks in advance

Answers

- timchessish
- August 24, 2023 at 10:45 am
- 0 votes
0
Firstly your class declaration is incomplete and you’ve missed the contents of the second td … maybe this is an incomplete copy from your code? You also need to take into account the white space in between and within every element.

This is my regex, which sees to work (but might need tweaking depending on your precise requirements and possible values in the content) …

data-rankkey="25">[s]*Averages[s]*</a>[s]*</td>[s]*<td class="page_speed_([d]*)">[s]*([d]*.[d]*)[s]*</td>[s]*</tr>

I’ve escaped the forward slashes, which may not be necessary for you.

For future reference https://www.regexpal.com/ is a good tool for playing around with regular expressions

Login or Signup to reply.

- VincentDecaux
- August 24, 2023 at 10:56 am
- 0 votes
0
You can simplify your Regex, it’s always harder to maintain big Regex, especially if you scrap an other website:
```
$pattern = '/class="page_speed_d+">s*(d+.d+)s*/';
if (preg_match_all($pattern, $curlresult, $matches)) {
    $numbers = $matches[1];
    
    foreach ($numbers as $number) {
        echo $number . "n";
    }
} else {
    echo "Not found.";
}
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Php – How do i scrape multiple lines in the sourcelist using cURL and preg_match_all

Answers