skip to Main Content

I’m trying to scrape data from a Looker Studio web page report using Puppeteer in Node.js, but I’m encountering issues because the report is dynamic. When I fetch the data, the body is empty. Here’s

import puppeteer from 'puppeteer';

async function fetchData() {
  try {
    const url = 'https://lookerstudio.google.com/u/0/reporting/e36054dd-ffc0-4ef4-b8ab-4d10f7ab4cda/page/wmP0D';
    const options = {
      args: [
        '--no-sandbox',
        '--disable-setuid-sandbox',
        '--disable-dev-shm-usage',
        '--disable-accelerated-2d-canvas',
        '--no-first-run',
        '--no-zygote',
        '--single-process',
        '--disable-gpu'
      ],
      headless: true
    };
    const browser = await puppeteer.launch(options);
    const page = await browser.newPage();

    await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36');
    await page.setViewport({width: 1920, height: 1080});
    await page.setRequestInterception(true);
    page.on('request', (req) => {
      if (req.resourceType() === 'stylesheet' || req.resourceType() === 'font' || req.resourceType() === 'image') {
        req.abort();
      } else {
        req.continue();
      }
    });

    await page.goto(url, {waitUntil: 'networkidle0'});

    await page.waitForSelector('.looker-report', { timeout: 60000 });

    const text = await page.evaluate(() => {
      return document.body.innerText;
    });

    console.log(text);  // This logs an empty string

    await page.close();
    await browser.close();
  } catch (error) {
    console.error('Error fetching data:', error);
  }
}

fetchData();

The issue I’m facing is that the text is always empty, even though I can see the data when I open the URL in a browser.
How can I modify my Puppeteer script to successfully scrape the dynamically loaded content from this Looker Studio report?

Any help or guidance would be greatly appreciated. Thank you!

I’ve tried:

  1. Waiting for the ‘.looker-report’ selector
  2. Using ‘networkidle0’ as the wait condition
  3. Setting a longer timeout

What I want to do: If you open the link, the page has a table, I am trying to fetch the rows of the table. The first few rows of the table.

However, none of these approaches have worked. The page seems to load its content dynamically, and I’m not sure how to capture this data.

2

Answers


  1. This can be due to a few reasons mainly it can be due to lesser timeout configuration. This can be resolved by increasing the time-out period until a certain page loads its content entirely. Secondly, rather than waiting for on element .looker-report only identify which needs to be waited until the data is being rendered or being fetched once those are successful only extract the data.
    Finally, you can use page.setRequestInterception(true) to wait for certain operations/actions completed explicitly.

    Refer the below code for the above mentioned modifications:

    import puppeteer from 'puppeteer';
    
    async function fetchData() {
      try {
        const url = 'https://lookerstudio.google.com/u/0/reporting/e36054dd-ffc0-4ef4-b8ab-4d10f7ab4cda/page/wmP0D';
        const options = {
          args: [
            '--no-sandbox',
            '--disable-setuid-sandbox',
            '--disable-dev-shm-usage',
            '--disable-accelerated-2d-canvas',
            '--no-first-run',
            '--no-zygote',
            '--single-process',
            '--disable-gpu'
          ],
          headless: true
        };
        const browser = await puppeteer.launch(options);
        const page = await browser.newPage();
    
        await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36');
        await page.setViewport({ width: 1920, height: 1080 });
    
    
        await page.setRequestInterception(true);     // Introduce Intercept network requests
        page.on('request', (req) => {
          if (req.resourceType() === 'stylesheet' || req.resourceType() === 'font' || req.resourceType() === 'image') {
            req.abort();
          } else {
            req.continue();
          }
        });
    
        await page.goto(url, { waitUntil: 'networkidle0', timeout: 60000 });
    
        
        await page.waitForSelector('.looker-report', { timeout: 60000 }); // Wait for a specific DOM element to appear and rendered
    
        
        await page.waitForFunction(() => { // Wait for a custom condition or specific data to appear    
        const dataElement = document.querySelector('.data-element');  //  Wait until a .data element is populated
          return dataElement && dataElement.textContent.trim() !== '';
        }, { timeout: 60000 });
    
        const text = await page.evaluate(() => {
          return document.body.innerText;
        });
        console.log(text);
        await browser.close();
      } catch (error) {
        console.error('Error fetching data:', error);
      }
    }
    
    fetchData();

    Hope it helps 🙂

    Login or Signup to reply.
  2. I had a look at the document returned and don’t see a .looker-report element. I dug around and it looks like the table has table class, so I’m waiting for that.

    import puppeteer from 'puppeteer';
    
    async function fetchData() {
      try {
        const url = 'https://lookerstudio.google.com/u/0/reporting/e36054dd-ffc0-4ef4-b8ab-4d10f7ab4cda/page/wmP0D';
        const options = {
          args: [
            '--no-sandbox',
            '--disable-setuid-sandbox',
            '--disable-dev-shm-usage',
            '--disable-accelerated-2d-canvas',
            '--no-first-run',
            '--no-zygote',
            '--single-process',
            '--disable-gpu'
          ],
          // Show the browser window.
          headless: false
        };
        const browser = await puppeteer.launch(options);
        const page = await browser.newPage();
    
        await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36');
        await page.setViewport({width: 1920, height: 1080});
        await page.setRequestInterception(true);
        page.on('request', (req) => {
          if (req.resourceType() === 'stylesheet' || req.resourceType() === 'font' || req.resourceType() === 'image') {
            req.abort();
          } else {
            req.continue();
          }
        });
    
        // Use networkidle2 rather than networkidle0.
        await page.goto(url, {waitUntil: 'networkidle2'});
    
        await page.waitForSelector('.table', { timeout: 60000 });
    
        const text = await page.evaluate(() => {
          return document.body.innerText;
        });
    
        console.log(text);
    
        await page.close();
        await browser.close();
      } catch (error) {
        console.error('Error fetching data:', error);
      }
    }
    
    fetchData();
    

    Output:

    DLMM Max Fees Opportunities
    Reset
    Share
    arrow_drop_down
    DLMM Max Fees Opportunities
    Pair Name
    DEX
    Meteora
    Bin Step
    TVL
    FDV
    24hr Changes
    Max 1d Fees
    Max 1d Fees / TVL
    ▼
    Dog-SOL
    400
    5.3K
    794.4K
    -0.46%
    $38,745
    736.31%
    FIST-SOL
    80
    2.2K
    140.7K
    -0.92%
    $9,527
    425.47%
    BOB-SOL
    80
    4.8K
    1.6M
    4.66%
    $20,169
    417.32%
    EAR-SOL
    200
    4.5K
    3.7M
    -0.81%
    $3,732
    82.10%
    KENZO-SOL
    100
    2.4K
    336.1K
    0.75%
    $1,839
    78.08%
    EAR-SOL
    400
    107.5K
    3.7M
    -0.82%
    $76,579
    71.26%
    SARB-BOB
    400
    4.5K
    399.1K
    0.06%
    $3,118
    68.69%
    SIGMA-SOL
    250
    1.7K
    1.9M
    0.10%
    $977
    56.83%
    EAR-SOL
    80
    6.2K
    3.9M
    -0.82%
    $2,958
    47.66%
    MOB-SOL
    100
    50.9K
    5.2M
    0.82%
    $21,076
    41.42%
    1 - 50 / 97
    <
    >
    Data Last Updated: 15/07/2024 06:08:33
    Privacy Policy
    

    Don’t seem to get the whole expected content, but at least getting something!

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search