Hey im trying to webscrape a specific thing on a website, like this
<td><a href="javascript:void(0)" class="rankRow"
data-rankkey="25">
Averages
</a>
</td>
<td class="page_speed_602217763">
82.84 </td>
</tr>
Where im trying to get the number 82,84 with the page_speed_** number variying and the on constant that differentiate from the rest of the sourcelist being the text "Averages"
I have tried using the preg_match_all but cant seem to search more than one line and whatevers in between.
My code i have used is the following
<?php
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $Player1Link);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$curlresult = curl_exec($curl);
preg_match_all('!data-rankkey="25">Averages</a></td><td class="(d.*)</tr>!', $curlresult, $matches);
print_r($matches);
$P1AvgHigh = $matches[0][3];
echo "<br>";
echo $P1AvgHigh;
curl_close($curl);
?>
Thanks in advance
2
Answers
Firstly your class declaration is incomplete and you’ve missed the contents of the second td … maybe this is an incomplete copy from your code? You also need to take into account the white space in between and within every element.
This is my regex, which sees to work (but might need tweaking depending on your precise requirements and possible values in the content) …
data-rankkey="25">[s]*Averages[s]*</a>[s]*</td>[s]*<td class="page_speed_([d]*)">[s]*([d]*.[d]*)[s]*</td>[s]*</tr>
I’ve escaped the forward slashes, which may not be necessary for you.
For future reference https://www.regexpal.com/ is a good tool for playing around with regular expressions
You can simplify your Regex, it’s always harder to maintain big Regex, especially if you scrap an other website: