I am having trouble finding individual html elements from the downloaded source code of a selected page. When I use the function $(data).find('p').length
it returns me the number 2 which is the correct answer, but if I use the function $(data).find('img').length
it returns me 0 and it should be 1.
async function getErrors() {
await $.ajax({
url: 'http://example.com',
method: 'get'
})
.done(async (siteText) => {
var data = $.parseHTML(siteText);
console.log(data);
console.log($(data).find('p').length);
console.log($(data).find('img').length);
await axios.get('http://anothersite.com')
.then((response) => {
//do something...
});
});
}
Live example:
var siteText = `<!DOCTYPE html>
<html lang="pl">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Test Site</title>
<style>
.black{
background-color: black;
color: #333131;
}
</style>
</head>
<body>
<h1>Strona Testowa</h1>
<div>
<h2>Lorem Ipsum</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Convallis aenean et tortor at risus. Pellentesque habitant morbi tristique senectus. Nisi est sit amet facilisis. Vel elit scelerisque mauris pellentesque pulvinar. Quisque egestas diam in arcu. Elit at imperdiet dui accumsan sit amet nulla. Urna porttitor rhoncus dolor purus non enim praesent elementum. Velit dignissim sodales ut eu sem integer vitae justo eget. Lacus suspendisse faucibus interdum posuere lorem. Et ultrices neque ornare aenean euismod. Porttitor eget dolor morbi non. Sit amet consectetur adipiscing elit. Amet nisl suscipit adipiscing bibendum est. Eu non diam phasellus vestibulum. Neque convallis a cras semper auctor. Risus at ultrices mi tempus imperdiet nulla malesuada pellentesque elit. Et molestie ac feugiat sed lectus vestibulum. Adipiscing diam donec adipiscing tristique risus nec. Imperdiet proin fermentum leo vel. Nibh mauris cursus mattis molestie a iaculis at erat pellentesque. Elementum integer enim neque volutpat ac tincidunt vitae semper. Nam libero justo laoreet sit. Nibh tortor id aliquet lectus proin nibh nisl condimentum id. Et sollicitudin ac orci phasellus egestas tellus. Nunc sed augue lacus viverra vitae congue eu. Dui vivamus arcu felis bibendum ut. Mattis nunc sed blandit libero volutpat sed. Commodo sed egestas egestas fringilla phasellus faucibus scelerisque eleifend. Velit aliquet sagittis id consectetur purus ut faucibus pulvinar elementum. Quam vulputate dignissim suspendisse in est ante in nibh. Accumsan sit amet nulla facilisi morbi. Ac ut consequat semper viverra. Viverra tellus in hac habitasse platea dictumst. Donec ultrices tincidunt arcu non sodales neque. In est ante in nibh mauris. Mattis enim ut tellus elementum sagittis. Consectetur adipiscing elit pellentesque habitant morbi tristique senectus et netus. Sed id semper risus in. Vestibulum lectus mauris ultrices eros in cursus turpis massa. Vitae tempus quam pellentesque nec nam aliquam sem et tortor. In arcu cursus euismod quis viverra nibh cras. Sit amet consectetur adipiscing elit duis tristique. Augue ut lectus arcu bibendum at varius vel pharetra vel. Pharetra magna ac placerat vestibulum lectus mauris ultrices eros in. Libero nunc consequat interdum varius sit amet mattis vulputate. Netus et malesuada fames ac. In pellentesque massa placerat duis ultricies lacus sed turpis tincidunt. Tellus in hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Duis convallis convallis tellus id interdum velit laoreet. Et tortor consequat id porta nibh venenatis cras. Laoreet sit amet cursus sit amet dictum sit amet justo.</p>
</div>
<img src="https://png.pngtree.com/png-clipart/20190108/ourmid/pngtree-tree-green-plant-photography-png-png-image_305004.jpg" >
<iframe width="560" height="315" src="https://www.youtube.com/embed/gK8s4LUJ7NE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<div class="black">
<p class="black">Lorem Ipsum</p>
</div>
</body>
</html>`;
var data = $.parseHTML(siteText);
console.log(data);
console.log($(data).find('p').length);
console.log($(data).find('img').length);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
2
Answers
I tried with your code with another site and that’s working fine. I modified your JS to temporary get rid of async/await:
As an alternative you could use the
html()
function on a newly created element to parse your HTML. This way thefind()
function works because it looks for child elements of the new element.Detailed explenation:
HTML parsed by
parseHTML()
andhtml()
will ignore<html>
,<head>
and<body>
tags.So the parsing returns an array of the nodes in the head and body so the
find()
function runs on every element in that array when wrapped in a jQuery object directly. That’s whyfind()
can’t find the direct children of<body>
. Thefilter()
function works because it filters the array.By wrapping the result in a new element the find function will work correctly on the full
<body>
content since they are now children of the new element.