here is the code :
const puppeteer = require("puppeteer");
const getQuotes = async () => {
// Start a Puppeteer session with:
// - a visible browser (`headless: false` - easier to debug because you'll see the browser in action)
// - no default viewport (`defaultViewport: null` - website page will in full width and height)
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
});
// Open a new page
const page = await browser.newPage();
await page.setExtraHTTPHeaders({
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
});
// On this new page:
// - open the "http://quotes.toscrape.com/" website
// - wait until the dom content is loaded (HTML is ready)
await page.goto("https://www.axs.com/browse/music", {
waitUntil: "networkidle2"});
const quotes = await page.evaluate(()=>{
//fetch the first element with class 'quote'
const quoteList = document.querySelectorAll('#page-relative-block > div > div.layout-column--primary-lg > div > div > div:nth-child(2) > div > div:nth-child(2)');
return Array.from(quoteList).map((quote)=>{
//fetch the sub-elements of each element with the class quotes
const text = quote.querySelector('.headliner').textContent;
const author = quote.querySelector('.supporting').textContent;
return {text, author}
});
});
console.log(quotes)
await browser.close();
};
// Start the scraping
getQuotes();
I tried removing textContent but that didn’t fix any thing since it ended up returning undefined
I also tried to user the user agent in the above code but even that doesn’t fix anything
2
Answers
as I analyzed the mention issue so I found that "textContent" is not null. Error is saying can’t read properties of null. This means quote.querySelector(‘.headliner’) is null. You can console.log this and can check why this is null. You are accessing the element that is not exist in DOM. I don’t html code so I can’t check. I hope this will fix your issue.
You are probably picking up things in DOM that are not quotes, or some of the quotes are missing author or quote for whatever reason. A usable scraper needs to handle the unexpected.
just check for existence of the elements you need…