I’m trying to grab products from ebay and open them on amazon.
So far, I have them being searched on amazon but I’m struggling with getting the products selected from the search results.
Currently its outputting a blank array and im not sure why. Have tested in a separate script without the grabTitles and the for loop. So im guessing there is something in that causing an issue.
Is there something i am missing here thats preventing the data coming back for prodResults?
const puppeteer = require('puppeteer');
const URL = "https://www.amazon.co.uk/";
const selectors = {
searchBox: '#twotabsearchtextbox',
productLinks: 'span.a-size-base-plus.a-color-base.a-text-normal',
productTitle: '#productTitle'
};
(async() => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://www.ebay.co.uk/sch/jmp_supplies/m.html?_trkparms=folent%3Ajmp_supplies%7Cfolenttp%3A1&rt=nc&_trksid=p2046732.m1684');
//Get product titles from ebay
const grabTitles = await page.evaluate(() => {
const itemTitles = document.querySelectorAll('#e1-11 > #ResultSetItems > #ListViewInner > li > .lvtitle > .vip');
var items = []
itemTitles.forEach((tag) => {
items.push(tag.innerText)
})
return items
})
//Search for the products on amazon in a new tab for each product
for (i = 0; i < grabTitles.length; i++) {
const page = await browser.newPage();
await page.goto(URL)
await page.type(selectors.searchBox, grabTitles[i++])
await page.keyboard.press('Enter');
//get product titles from amazon search results
const prodResults = await page.evaluate(() => {
const prodTitles = document.querySelectorAll('span.a-size-medium.a-color-base.a-text-normal');
let results = []
prodTitles.forEach((tag) => {
results.push(tag.innerText)
})
return results
})
console.log(prodResults)
}
})()
2
Answers
You’ve hit on an age old problem with Puppeteer and knowing when a page has fully completed rendering or loading.
You could try adding the following:
Usually I find
networkidle2
isn’t always reliable enough so I add an arbitrary extrawaitForTimeout
. You’ll need to play around with the timeout value (10000 = 10 seconds) to get what you’re looking for, not ideal I know but I’ve not found a better way.There are a few potential problems with the script:
await page.keyboard.press('Enter');
triggers a navigation, but your code never waits for the navigation to finish before trying to select the result elements. UsewaitForNavigation
,waitForSelector
orwaitForFunction
(notwaitForTimeout
).If you do wait for a navigation, there’s a special pattern using
Promise.all
needed to avoid a race condition, shown in the docs.Furthermore, you might be able to skip a page load by going directly to the search URL by building the string yourself. This should provide a significant speedup.
Your code spawns a new page for every item that needs to be processed, but these pages are never closed. I see
grabTitles.length
as 60. So you’ll be opening 60 tabs. That’s a lot of resources being wasted. On my machine, it’d probably hang everything. I’d suggest making one page and navigating it repeatedly, or close each page when you’re done. If you want parallelism, consider a task queue or run a few pages simultaneously.grabTitles[i++]
— why incrementi
here? It’s already incremented by the loop, so this appears to skip elements, unless your selectors have duplicates or you have some other reason to do this.span.a-size-medium
doesn’t work for me, which could be locality-specific. I seea span.a-size-base-plus.a-color-base.a-text-normal
, but you may need to tweak this to taste.Here’s a minimal example. I’ll just do the first 2 items from the eBay array since that’s coming through fine.
Output:
Note that I added user agents and headers to be able to use
headless: true
but it’s incidental to the main solution above. You can return toheadless: false
or check out canonical threads like How to avoid being detected as bot on Puppeteer and Phantomjs? and Why does headless need to be false for Puppeteer to work? if you have further issues with detection.