I’m trying to obtain the cookies of some web pages. I created a Node.js script using Puppeteer to load the page, interact with it, and then save the cookies in a file.
When I run this directly in my machine, even in headless, it works. But when I dockerize it, it does not work for some pages. For example, it worked with Google but not RentalCars.
Since all together is a bit of code, I attach some gists with the minimal reproducible example.
Node.js code
https://gist.github.com/omirobarcelo/a459f1b4fa6b47b6fb351eba477564fe
Dockerfile
https://gist.github.com/omirobarcelo/e2f94ac55a2157b6cfeefdf79d6df4ef
Docker Compose
https://gist.github.com/omirobarcelo/d9c42be9172f0152134ff6402bfd25d4
This example tries to load the page, resolves captchas if there are any, rejects cookies, waits for an arbitrary selector (an input field), and saves the page cookies into a file.
We are also using the packages "puppeteer-extra", "puppeteer-extra-plugin-recaptcha", and "puppeteer-extra-plugin-stealth".
If I try with https://google.com, it works when I run the code directly in my machine in headless mode and when I dockerize it. But if I try https://www.rentalcars.com, it works when I run the code directly in my machine in headless mode, but when I dockerize it the page never fully loads, so I cannot resolve captchas or even get the contents of the page through page.content()
, it ends up failing with a ProtocolError: Runtime.callFunctionOn timed out
.
In some occasions I see that some network requests do not complete, but it is not consistent. As in some times the GTM scripts loads and other times not. It seems in general it has trouble with JavaScript files.
It is worth noting that the main cookie to obtain is a Reese84, used for fingerprinting. I do not know if that could be the problem, or maybe I am missing some libraries required for it in the Docker installation. I checked and all the packages mentioned in here and here are installed in the Dockerfile.
These are the versions used:
- Puppeteer: 22.15.0
- Node: 20.13.0
- npm: 10.5.2
- OS: macOS M3
Any help or guidance on maybe if some packages are missing in the Docker install would be greatly appreciated.
EDIT1: I tried updating the navigation to
await page.goto(url, {
waitUntil: 'networkidle2',
});
And it failed with TimeoutError: Navigation timeout of 120000 ms exceeded
.
2
Answers
it happened with me when i used node:alpine baseImage
what worked for me was using node:slim as baseImage which ubuntu distribution, try following
I’m encountering the same issue as others where my Angular application takes up to 5 minutes to load when generating a PDF using either Puppeteer or Playwright. The problem seems to be related to the main.js file of my frontend application, as it is the only file that takes a long time to load.
I initially thought it could be related to the Docker image I was using. When I used my local Docker image, I was able to generate the PDF successfully. However, when I deployed the image to Rancher, the PDF generation failed, and it took up to 5 minutes to load the main.js file.
I’ve tried various solutions, including switching to the Node Slim image and alternating between Puppeteer and Playwright, but neither solution resolved the issue.
Has anyone encountered this problem or know what could be causing the long load time for main.js when generating the PDF, specifically in the Rancher environment?
Any insights or suggestions would be greatly appreciated!