I have build a scraper using Puppeteer and Node.js and now i want to dockerize it. I’ve tried multiple ways to tackle this, but encountering issue when puppeteer tries to start the browser for scraping.
My current basic Dockerfile without Puppeteer or any other dependencies:
I’ve tried multiple ways to update this Dockerfile in every sense (adding chrome, puppeteer) but doesn’t work
# Use Node.js runtime as the base image
FROM node:18
# Set the working directory in the container
WORKDIR /usr/src/app
# Copy package.json and package-lock.json to the working directory
COPY package*.json ./
# Install dependencies
RUN npm install
# Copy the rest of the application code
COPY . .
# Expose the port the app runs on
EXPOSE 8080
# Command to run the application
CMD ["node", "scraper.js"]
Code :
Snippet which triggers/launches the browser
// Launch browser
const browser = await launch({ headless: true, defaultViewport: null });
Can someone help me here how can i tackle this to work ideally ?
Tried every possible way from here, here and here
Encountered Error :
An error occurred during scraping:
Error: Failed to launch the browser process!
web-crawler-1 | rosetta error: failed to open elf at /lib64/ld-linux-x86-64.so.2
web-crawler-1 |
web-crawler-1 |
web-crawler-1 |
web-crawler-1 | TROUBLESHOOTING: https://pptr.dev/troubleshooting
web-crawler-1 |
web-crawler-1 | at Interface.onClose (file:///usr/src/app/node_modules/@puppeteer/browsers/lib/esm/launch.js:301:24)
web-crawler-1 | at Interface.emit (node:events:529:35)
web-crawler-1 | at Interface.close (node:internal/readline/interface:534:10)
web-crawler-1 | at Socket.onend (node:internal/readline/interface:260:10)
web-crawler-1 | at Socket.emit (node:events:529:35)
web-crawler-1 | at endReadableNT (node:internal/streams/readable:1400:12)
web-crawler-1 | at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
2
Answers
This solution worked for me.
To run Puppeteer inside a Docker container you should install Google Chrome manually because, in contrast to the Chromium package offered by Debian, Chrome only offers the latest stable version.
Install browser on Dockerfile :
Additionally, If you are in an ARM-based CPU (Apple M1) like me, you should use the
--platform linux/amd64
argument when you build the Docker image.Build Command :
docker build --platform linux/amd64 -t <image-name> .
Note : After updating your
Dockerfile
, make sure to update the puppeteerscript
, while launching the puppeteer browser add executable path with the path to chrome we recently installed on the machine.Parv’s solution worked for me in my local docker but not in an azure kubernetes cluster (aks).
That’s my final solution:
Thanks to https://github.com/puppeteer/puppeteer/issues/11023#issuecomment-1776247197
My Dockerfile
Snippet of kubernetes deployment yaml