skip to Main Content

here is the code :

const puppeteer = require("puppeteer");

const getQuotes = async () => {
  // Start a Puppeteer session with:
  // - a visible browser (`headless: false` - easier to debug because you'll see the browser in action)
  // - no default viewport (`defaultViewport: null` - website page will in full width and height)
 
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
  });

  // Open a new page
  const page = await browser.newPage();

  await page.setExtraHTTPHeaders({
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
});

  // On this new page:
  // - open the "http://quotes.toscrape.com/" website
  // - wait until the dom content is loaded (HTML is ready)
  await page.goto("https://www.axs.com/browse/music", {
    waitUntil: "networkidle2"});
    
    const quotes = await page.evaluate(()=>{
        //fetch the first element with class 'quote'
        const quoteList = document.querySelectorAll('#page-relative-block > div > div.layout-column--primary-lg > div > div > div:nth-child(2) > div > div:nth-child(2)');
        return Array.from(quoteList).map((quote)=>{

            //fetch the sub-elements of each element with the class quotes
        const text = quote.querySelector('.headliner').textContent;
        const author = quote.querySelector('.supporting').textContent;
       

        return {text, author}

        });
        
    });

    console.log(quotes)

    await browser.close();

};

// Start the scraping
getQuotes();

I tried removing textContent but that didn’t fix any thing since it ended up returning undefined
I also tried to user the user agent in the above code but even that doesn’t fix anything

2

Answers


  1. as I analyzed the mention issue so I found that "textContent" is not null. Error is saying can’t read properties of null. This means quote.querySelector(‘.headliner’) is null. You can console.log this and can check why this is null. You are accessing the element that is not exist in DOM. I don’t html code so I can’t check. I hope this will fix your issue.

    Login or Signup to reply.
  2. You are probably picking up things in DOM that are not quotes, or some of the quotes are missing author or quote for whatever reason. A usable scraper needs to handle the unexpected.

    just check for existence of the elements you need…

            return Array.from(quoteList).map((quote)=>{
    
                //fetch the sub-elements of each element with the class quotes
                const quoteElem = quote.querySelector('.headliner');
                const authorElem = quote.querySelector('.supporting');
                if (!quoteElem || !authorElem) return null;
                const text = quoteElem.textContent;
                const author = authorElem.textContent;
    
                return {text, author}
            })
            .filter((o) => o);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search