skip to Main Content

Hey im trying to webscrape a specific thing on a website, like this

  <td><a href="javascript:void(0)" class="rankRow"
                                                                           data-rankkey="25">
                                                                                    Averages
                                                                            </a>
                                                                    </td>
                                                                    <td class="page_speed_602217763">
                                                                            82.84                                                                        </td>
                                                            </tr>

Where im trying to get the number 82,84 with the page_speed_** number variying and the on constant that differentiate from the rest of the sourcelist being the text "Averages"

I have tried using the preg_match_all but cant seem to search more than one line and whatevers in between.

My code i have used is the following

<?php
        $curl = curl_init();
      curl_setopt($curl, CURLOPT_URL, $Player1Link);
      curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
      curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
      $curlresult = curl_exec($curl);
      preg_match_all('!data-rankkey="25">Averages</a></td><td class="(d.*)</tr>!', $curlresult, $matches);
      print_r($matches);
    
      $P1AvgHigh = $matches[0][3];
      echo "<br>";
      echo $P1AvgHigh;
      curl_close($curl);
?>

Thanks in advance

2

Answers


  1. Firstly your class declaration is incomplete and you’ve missed the contents of the second td … maybe this is an incomplete copy from your code? You also need to take into account the white space in between and within every element.

    This is my regex, which sees to work (but might need tweaking depending on your precise requirements and possible values in the content) …

    data-rankkey="25">[s]*Averages[s]*</a>[s]*</td>[s]*<td class="page_speed_([d]*)">[s]*([d]*.[d]*)[s]*</td>[s]*</tr>

    I’ve escaped the forward slashes, which may not be necessary for you.

    For future reference https://www.regexpal.com/ is a good tool for playing around with regular expressions

    Login or Signup to reply.
  2. You can simplify your Regex, it’s always harder to maintain big Regex, especially if you scrap an other website:

    $pattern = '/class="page_speed_d+">s*(d+.d+)s*/';
    if (preg_match_all($pattern, $curlresult, $matches)) {
        $numbers = $matches[1];
        
        foreach ($numbers as $number) {
            echo $number . "n";
        }
    } else {
        echo "Not found.";
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search