skip to Main Content

I have a container on the website and I want to get all the

tags that are present in that particular container only.

<div class="c-product-review-card">
    <div class="c-product-review-card__container">
        <div class="c-product-review-card__left-column">
            <div class="c-product-review-user-info c-product-review-card__user-info">
                <h5 class="c-product-review-user-info__username u-spacer--1pt5">USERNAME</h5>
                <div class="c-product-review-user-info__details-container">
                    <div class="c-product-review-user-info__item">
                        <p class="o-text--caption c-product-review-user-info__details"><span
                                class="u-text--gray">Size:</span> ONE SIZE</p>
                        <p class="o-text--caption c-product-review-user-info__details"><span
                                class="u-text--gray">Color:</span> black mult...</p>
                        <p class="o-text--caption c-product-review-user-info__details"><span
                                class="u-text--gray">Height:</span> 5'3"</p>
                        <p class="o-text--caption c-product-review-user-info__details"><span
                                class="u-text--gray">Weight:</span> 135 lbs.</p>
                    </div>
                    <div class="c-product-review-user-info__item">
                        <p class="o-text--caption c-product-review-user-info__details"><span class="u-text--gray">Body
                                Type:</span> Pear</p>
                        <p class="o-text--caption c-product-review-user-info__details"><span class="u-text--gray">Bra
                                Size:</span> 34B</p>
                        <p class="o-text--caption c-product-review-user-info__details"><span
                                class="u-text--gray">Age:</span> 29</p>
                    </div>
                </div>
            </div>
        </div>
        <div class="c-product-review-card__details c-product-review-card__details--list"><!---->
            <div class="c-product-review-card__review-body-container">
                <div class="c-product-review-card__review-body">
                    <h4 class="u-spacer--1 c-product-review-card__review-title">Cute and breezy</h4>
                    <p class="o-text--caption">Packed this on a trip to Peru. It came in handy on those cool spring
                        nights there, perfect for strolling in Lima. It’s not too light and not too heavy. Worked well
                        with a simple outfit underneath </p>
                </div>
                <div class="c-product-review-card__review-picture-container"><!----></div>
            </div>
        </div>
    </div><!---->
</div>

This is the website HTML I’m trying to scrape.

I’ve been trying to use the evaluate function using the container to get all the pTags but this is not working. Please help!

const reviews = (await page.$$(cssSelectors.REVIEW_CARD_CONTAINER)).splice(3);

            let reviewsRes = {"reviews": []};
            for(const review of reviews.splice(0, 1)){
                const userName = await page.evaluate(el => el.querySelector('div > h5').textContent, review);
                console.log(userName);
                
            
                const pTags = await page.evaluate(`div > p`, (paragraphs) => {
                    return paragraphs.map((p) => p.textContent);
                }, review);
                console.log(pTags);
            }

2

Answers


  1. The best way to do this is probably to get the ElementHandle of the div, and use .$$ on it to get all of the p elements inside of the div. Some variation of the following should work:

    const container = await page.$("QUERY_FOR_CONTAINER");
    const tags = await container.$$("p");
    

    You should then just be able to iterate over the tags array to manipulate or extract data from them however you want.

    Login or Signup to reply.
  2. I’m not sure what output you expect, or the full structure of the HTML (which elements are many, which are one, etc), but here’s a general sketch you can adjust to meet your needs:

    const puppeteer = require("puppeteer"); // ^21.6.0
    
    const html = `::HTML copied from your question::`;
    
    let browser;
    (async () => {
      browser = await puppeteer.launch({headless: "new"});
      const [page] = await browser.pages();
      await page.setContent(html);
      const data = await page.$$eval(".c-product-review-card", els =>
        els.map(el => {
          const text = s =>
            el
              .querySelector(s)
              .textContent.replace(/s+/g, " ")
              .trim();
          return {
            name: text(".c-product-review-user-info__username"),
            infoItems: [
              ...el.querySelectorAll(
                ".c-product-review-user-info__item"
              ),
            ].map(el =>
              Object.fromEntries(
                [...el.querySelectorAll("p")].map(e => [
                  e
                    .querySelector("span")
                    .textContent.trim()
                    .replace(/s+/g, " ")
                    .replace(/:$/, ""),
                  e.childNodes[1].textContent.trim(),
                ])
              )
            ),
            reviewTitle: text(
              ".c-product-review-card__review-title"
            ),
            reviewBody: text(
              ".c-product-review-card__review-body p"
            ),
          };
        })
      );
      console.log(JSON.stringify(data, null, 2));
    })()
      .catch(err => console.error(err))
      .finally(() => browser?.close());
    

    Output:

    [
      {
        "name": "USERNAME",
        "infoItems": [
          {
            "Size": "ONE SIZE",
            "Color": "black mult...",
            "Height": "5'3"",
            "Weight": "135 lbs."
          },
          {
            "Body Type": "Pear",
            "Bra Size": "34B",
            "Age": "29"
          }
        ],
        "reviewTitle": "Cute and breezy",
        "reviewBody": "Packed this on a trip to Peru. It came in handy on those cool spring nights there, perfect for strolling in Lima. It’s not too light and not too heavy. Worked well with a simple outfit underneath"
      }
    ]
    

    If the data is loaded asynchronously, don’t forget to use waitForSelector. If this doesn’t work, please share the page and exact expected output so I can validate it.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search