I’m scraping https://naamhinaam.com/baby-girl-names-a?page=${pageNumber} website, and after doing that so, puppeteer throwing an empty object without value. here is my code :
const puppeteer = require("puppeteer");
const express = require("express");
const cors = require("cors");
const app = express();
app.use(cors());
let data = [];
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
});
const page = await browser.newPage();
for (let pageNumber = 1; pageNumber < 42; pageNumber++) {
await page.goto(`https://naamhinaam.com/baby-girl-names-a?page=${pageNumber}`);
await page.waitForTimeout(3000);
await page.click("#promotionalPopup > div > div > div > button > span");
await page.$eval(
"div.name-suggestion.mt-1 > div > div:nth-child(22)",
(el) => el.remove()
);
await page.$eval(
"div.name-suggestion.mt-1 > div > div:nth-child(43)",
(el) => el.remove()
);
for (let i = 3; i < 54; i++) {
let fullName = "Null";
if (await page.$("div.name-suggestion.mt-1 > div > div:nth-child(22)")) {
continue;
}
if (await page.$("div.name-suggestion.mt-1 > div > div:nth-child(22)")) {
continue;
}
await page.waitForSelector(
`div.name-suggestion.mt-1 > div > div:nth-child(${i}) > div.nsg__name_meaning > a`
);
let element = await page.$(
`div.name-suggestion.mt-1 > div > div:nth-child(${i}) > div.nsg__name_meaning > a`
);
fullName = await page.evaluate((el) => el.textContent, element);
data.push({ fullName });
}
console.log(data);
}
await browser.close();
})();
app.get("/", (req, res) => {
res.status(200).json(data);
});
app.listen(3000, () => {
console.log("App is running...");
});
iam removing this element in puppeteer because it containing ad::
await page.$eval(
"div.name-suggestion.mt-1 > div > div:nth-child(22)",
(el) => el.remove()
);
await page.$eval(
"div.name-suggestion.mt-1 > div > div:nth-child(43)",
(el) => el.remove()
);
I’m looping pages and getting data here. But after I’m getting an empty array.
2
Answers
Assuming you are trying to extract baby Names and Meaning you can use below code, i have updated the locator and removed clicking the popup as its not required since we are only extracting the content
Outputs
There is a
:has()
CSS pseudo class that you can use instead of removing elements, read about it here, note that it doesn’t work with Firefox, but with Chromium that puppeteer uses it works fine.So this
gets the list removing the elements you don’t want from it.
The popup, that you’re trying to close doesn’t block you from getting data from the pages so you don’t need to click it.
page.waitForTimeout()
method is obsolete instead usepage.waitForSelector()
.The popup doesn’t seem to block anything so you don’t need to do anything with it.
You also have an error in your for loop, you aren’t getting the last page so
pageNumber < 42
should bepageNumber <= 42
;Code :
Note : The part from
//Skipable start
to// Skipable end
bypasses loading elements stated in the regex to speed things up.