I want to scrape all pages of a website and get the meta tag description
like
<meta name="description" content="I want to get this description of this meta tag" />
similarly for all other pages I want to get their individual meta description
Here is my code
add_action('woocommerce_before_single_product', 'my_function_get_description');
function my_function_get_description($url) {
$the_html = file_get_contents('https://tipodense.dk/');
print_r($the_html)
}
Thisprint_r($the_html)
gives me the whole website, I don’t know how to get the meta description of each page
Kindly guide me thanks
2
Answers
You have to look about preg_match and regex expression.
Here it’s quite simple :
https://regex101.com/r/JMcaUh/1
The description is catched by capturing group () and saved in
$matches[0][1]
EDIT : DOMDocument is a great solution too, but assuming you only want description, using regex looks easier to me !
The better way to parse an HTML file is to use
DOMDocument
and, in many cases, combine that withDOMXPath
to run queries on the DOM to find elements of interest.For instance, in your case to extract the meta description you could do:
Which yields:
Using the sitemap ( or part of it ) you could do like this: