skip to Main Content

I want to parse data from site https://csfloat.com/search?def_index=4727 and i use puppeteer.

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();
  await page.goto("https://csfloat.com/search?def_index=4727");

  let arr = await page.evaluate(() => {
    let text = document.getElementsByClassName("price ng-star-inserted")
    let array = []
    for (let i = 0; i < text.length; i++) {
      array.push(text[i].innerText)
    }

    console.log(array)
  })
})()

But the problem is that when i run this script, it opens it’s own browser and open this page, where i am not authorized, so i cant parse data beacuse even if i paste my login and password, i have to confirm this in steam, so, how can i do this from my browser where i am authorized or how can i fix this problem, maybe another library???

2

Answers


  1. You can always use your beloved browser developer tools.
    And then select the Console tab to write your own script there.

    console.log

    Or you can also use Recorder tab when you want to do some automated routine task daily or hourly.
    You can access it by selecting the double chevron arrow on tab bar.

    how to access recorder tab

    And there, you can do many things to automate clicks, scroll, and even waiting for an element to be exist and visible. Then you can always export it to puppeteer script, if you like to.

    recorder tab

    I hope this can help you much.

    Login or Signup to reply.
  2. Edi gives some good suggestions, but to supplement those, here are a few other approaches. There’s no silver bullet in web scraping, so you’ll need to experiment to see what works for a particular site (I don’t have a Steam account).

    1. Launch Puppeteer with the userDataDir flag, then run it once headfully with a long timeout or REPL and login manually. Kill the script without trying to automate the site yet. The session should be saved, so on subsequent runs, you’ll be pre-authorized and you can automate as normal.
      • The major drawback is that the session may expire within hours or days, which might be a deal-breaker. But for sites that persist sessions for months, this could be a viable option.
      • A variant on this is extracting the session cookie by hand and copying it into a plain Node fetch call. A simple example of this strategy is here. The same caveats as above apply.
    2. Connect to an existing browser session with Puppeteer and login manually.
    3. Without Puppeteer, you can keep a normal browser session open in a tab with a userscript or console code (as Edi showed) that extracts the data you want and sends it to a server for processing (writing to file, etc). I wrote a blog post on this technique. Using the recorder feature is a variant on this.
    4. Automate the login process fully with Puppeteer so the script can run end to end. This might be tricky for certain auth strategies like Google and Steam, which take great pains to prevent this. This is the only truly scalable option, but not all automations need to scale.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search