I have a container on the website and I want to get all the
tags that are present in that particular container only.
<div class="c-product-review-card">
<div class="c-product-review-card__container">
<div class="c-product-review-card__left-column">
<div class="c-product-review-user-info c-product-review-card__user-info">
<h5 class="c-product-review-user-info__username u-spacer--1pt5">USERNAME</h5>
<div class="c-product-review-user-info__details-container">
<div class="c-product-review-user-info__item">
<p class="o-text--caption c-product-review-user-info__details"><span
class="u-text--gray">Size:</span> ONE SIZE</p>
<p class="o-text--caption c-product-review-user-info__details"><span
class="u-text--gray">Color:</span> black mult...</p>
<p class="o-text--caption c-product-review-user-info__details"><span
class="u-text--gray">Height:</span> 5'3"</p>
<p class="o-text--caption c-product-review-user-info__details"><span
class="u-text--gray">Weight:</span> 135 lbs.</p>
</div>
<div class="c-product-review-user-info__item">
<p class="o-text--caption c-product-review-user-info__details"><span class="u-text--gray">Body
Type:</span> Pear</p>
<p class="o-text--caption c-product-review-user-info__details"><span class="u-text--gray">Bra
Size:</span> 34B</p>
<p class="o-text--caption c-product-review-user-info__details"><span
class="u-text--gray">Age:</span> 29</p>
</div>
</div>
</div>
</div>
<div class="c-product-review-card__details c-product-review-card__details--list"><!---->
<div class="c-product-review-card__review-body-container">
<div class="c-product-review-card__review-body">
<h4 class="u-spacer--1 c-product-review-card__review-title">Cute and breezy</h4>
<p class="o-text--caption">Packed this on a trip to Peru. It came in handy on those cool spring
nights there, perfect for strolling in Lima. It’s not too light and not too heavy. Worked well
with a simple outfit underneath </p>
</div>
<div class="c-product-review-card__review-picture-container"><!----></div>
</div>
</div>
</div><!---->
</div>
This is the website HTML I’m trying to scrape.
I’ve been trying to use the evaluate function using the container to get all the pTags but this is not working. Please help!
const reviews = (await page.$$(cssSelectors.REVIEW_CARD_CONTAINER)).splice(3);
let reviewsRes = {"reviews": []};
for(const review of reviews.splice(0, 1)){
const userName = await page.evaluate(el => el.querySelector('div > h5').textContent, review);
console.log(userName);
const pTags = await page.evaluate(`div > p`, (paragraphs) => {
return paragraphs.map((p) => p.textContent);
}, review);
console.log(pTags);
}
2
Answers
The best way to do this is probably to get the
ElementHandle
of the div, and use.$$
on it to get all of thep
elements inside of the div. Some variation of the following should work:You should then just be able to iterate over the tags array to manipulate or extract data from them however you want.
I’m not sure what output you expect, or the full structure of the HTML (which elements are many, which are one, etc), but here’s a general sketch you can adjust to meet your needs:
Output:
If the data is loaded asynchronously, don’t forget to use
waitForSelector
. If this doesn’t work, please share the page and exact expected output so I can validate it.