I am Scraping a Manga page: https://battwo.com/title/75019-blue-lock
can’t get the info I want. These are the issues:
- The page opens up but it doesn’t close up. Already used
page.setDefaultNavigationTimeout(timeout, 7000)
(failed many times 🤣) The console.log
doesn’t return const title 😒- I am using the try catch for errors; but am I missing something?🤔
I’ve been using Playwright and Puppeteer documentation for the code.
I want to scrape the following elements from the page:
- title
- image
- status
- year 👈 this is what I’m hopping to get…
- genres
- synopsis
- artists
- authors
- uploaders
- To get just one element I’m triying to use
await page.$eval
- To get multiple elements I’m planning to use
await page.$$eval
then map the elements to get an array of all the content. When the scrape is finished, I want to pass the data to a CSV file, then convert the CSV to an Excel or Google sheet.
This is the code I’ve built so far:
import playwright from 'playwright';
(async () => {
try {
// Start the browser, observe the process. // Or 'chromium' or 'webkit'.
const browser = await playwright.firefox.launch({ headless: false });
// Create a new incognito browser context. Set User Agent Method (Avoid block requests).
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)' +
' AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
});
// Create a new page in a pristine context. Set ViewportSize.
const page = await context.newPage();
await page.setViewportSize
({width: 640,
height: 480,});
// Page to get data from
await page.goto('https://battwo.com/title/75019-blue-lock');
page.setDefaultNavigationTimeout(timeout,7000)
// Configure the Main Selector.
await page.waitForSelector('main');
// Configure the const Selector.
// Extract the required info.
await page.waitForSelector('#text "Blue Lock"');
const title = await page.$eval('.#text "Blue Lock" h3', element => element.innerText);
console.log(title);
// Close all process
await context.close();
await browser.close();
} catch (error) {
}
}) ();
2
Answers
I can suggest you to use wpscript ultimate manga scraper codes, it uses wordpress scraper as well as puppeteer js to scrap manga from mangafox
There’s a major problem in your code:
This Pokemon exception handler swallows all of your exceptions, giving you no feedback about your execution and leading you into thinking the problem is something to do with the navigation. Always log your errors. Adding
console.error(error)
into thecatch
block immediately tells you what you need to do to start fixing the script:The line in question is
You probably meant to call this like:
Before going further, the reason the process is hanging is that
browser.close()
is never called. Always put that in afinally
block so your process can exit normally regardless of whether an error occurs or not.Next bug:
'#text "Blue Lock"'
is not a CSS selector. Logs to the rescue once again:Try using locators, and avoid hardcoding the text you want into the selector (you don’t know that in advance, right?).
Here’s my solution:
Now that you’re back on track, I’ll leave it as an exercise to grab the rest of the data you want.