I am writing a PHP scraping program. The program works smoothly for me but I found the scraping result slightly differs from my expectation.
Here is my script
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $eng_SCCW_array["Here is my website"]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
$doc = new DOMDocument();
@ $doc->loadHTML($html);
$elements_content = $doc->textContent;
echo $elements_content."</br>"."</br>";
And here is the scraping result:
The problem is, some white space is missed as any ‘br’ will not be read by the script. However, this would make the data process later become very complicated. I want to split the scraping result as if the image below. But how shall I do it?
2
Answers
First Check if you are getting any tag or element (Like
or etc.) with which you can make a loop and add a line break in between it.
Now you may be getting the whole DOM as text and your code is only adding "
</br>
" at the end of the response.the above will give you the whole DOM as text
like this https://prnt.sc/-oPTh0o7oXk_ See the Screenshot
you need to find the tag and add br with the help of loop
If you can share the URL from where you get the response. I will check the response type and will try to add line break in between
If you are going to iterate through an array if urls, you’ll need to couple instructions or a parsing function with each url.
For the provided url, I’d use XPath to target the desired content.
Code: (Demo)
Output:
Or if you want the whole HTML chunk so that you can perform surgery on it, use
saveXML()
. (Demo)Output: