I am trying to scrape the h1
element from the HTML body of a particular website:
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
header('Content-Type: text/plain; charset=utf-8');
header('Access-Control-Allow-Origin: *');
header('Access-Control-Allow-Methods: POST, GET, OPTIONS');
if(isset($_POST["url"])){
$user_agent = "Mozilla/5.0 (Macintosh;
Intel Mac OS X 10_14_4) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36";
$ch = curl_init();
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 3600);
curl_setopt($ch, CURLOPT_TIMEOUT, 3600);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$html=curl_exec($ch);
if (!curl_errno($ch)){
$resultStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($resultStatus == 200) {
@$DOM = new DOMDocument;
@$DOM->loadHTML('<?xml encoding="UTF-8">' . $html);
echo $DOM->getElementsByTagName('h1')[0]->textContent;
}
else
echo "Error: ".$resultStatus;
}
else
echo "No h1 found ".curl_error($ch)
}
?>
I am trying to find the h1
element of this particular website:
https://neindiabroadcast.com/2023/03/24/bharat-gaurav-train-flagged-off-from-guwahati-for-arunachal-pradesh/
But I keep getting the following error
No h1 found Failed to connect to neindiabroadcast.com port 443 after 15402 ms: Connection timed out
I tried increasing the connection timeout and execution timeout to 3600 seconds, but the result is still the same. How do I resolve this issue?
EDIT #1: I’ve discovered that the error only shows in my live
server. When I run the code in my local
server, the data is fetched succesfully.
2
Answers
I test your code. Except for some syntax errors your code is working fine. here try this one:
The timeout could be due to a number of reasons:
I’d suggest to use the "curl" command-line tool to check the URLs that resulted in timeout, on the same machine that is running the PHP script, using the "-vvv" (high verbosity options). Check the output, and if the result is the same (timing out as when executed in PHP), the problem would be not with your code but with the underlying network / system configuration.