I’m trying to wrap a scraping project in a Docker container to run it on a droplet. The spider scraps a website and then writes the data to a postgres database. The postgres database is already running and managed by Digitalocean.
When I run the command locally to test, everything is fine:
docker compose up
I can visualize the spider writing on the database.
Then, I use github action to build and push my docker image on a registry each time I push the code with the script:
name: CI
# 1
# Controls when the workflow will run.
on:
# Triggers the workflow on push events but only for the master branch
push:
branches: [ master ]
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
inputs:
version:
description: 'Image version'
required: true
#2
env:
REGISTRY: "registry.digitalocean.com/*****-registery"
IMAGE_NAME: "******-scraper"
POSTGRES_USERNAME: ${{ secrets.POSTGRES_USERNAME }}
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
POSTGRES_HOSTNAME: ${{ secrets.POSTGRES_HOSTNAME }}
POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }}
POSTGRES_DATABASE: ${{ secrets.POSTGRES_DATABASE }}
SPLASH_URL: ${{ secrets.SPLASH_URL }}
#3
jobs:
build-compose:
name: Build docker-compose
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Insall doctl
uses: digitalocean/action-doctl@v2
with:
token: ${{ secrets.DIGITALOCEAN_ACCESS_TOKEN }}
- name: Login to DO Container Registry with short-lived creds
run: doctl registry login --expiry-seconds 1200
- name: Remove all old images
run: if [ ! -z "$(doctl registry repository list | grep "****-scraper")" ]; then doctl registry repository delete-manifest ****-scraper $(doctl registry repository list-tags ****-scraper | grep -o "sha.*") --force; else echo "No repository"; fi
- name: Build compose
run: docker compose -f docker-compose.yaml up -d
- name: Push to Digital Ocean registery
run: docker compose push
deploy:
name: Deploy from registery to droplet
runs-on: ubuntu-latest
needs: build-compose
Then I ssh root@ipv4
manually to my droplet in order to install docker
, docker compose
and run the image from the registry with:
# Login to registry
docker login -u DO_TOKEN -p DO_TOKEN registry.digitalocean.com
# Stop running container
docker stop ****-scraper
# Remove old container
docker rm ****-scraper
# Run a new container from a new image
docker run -d --restart always --name ****-scraper registry.digitalocean.com/****-registery/****-scraper
As soon as the python script starts on the droplet I have the error:
psycopg2.OperationalError: could not connect to server: No such file
or directory Is the server running locally and accepting connections
on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
It seems like I’m doing something wrong and I can’t find how to fix this so far.
I would appreciate some help explanations.
Thanks,
My Dockerfile:
# As Scrapy runs on Python, I run the official Python 3 Docker image.
FROM python:3.9.7-slim
# Set the working directory to /usr/src/app.
WORKDIR /usr/src/app
# Install libpq-dev for psycopg2 python package
RUN apt-get update
&& apt-get -y install libpq-dev gcc
# Copy the file from the local host to the filesystem of the container at the working directory.
COPY requirements.txt ./
# Install Scrapy specified in requirements.txt.
RUN pip3 install --no-cache-dir -r requirements.txt
# Copy the project source code from the local host to the filesystem of the container at the working directory.
COPY . .
# For Slash
EXPOSE 8050
# Run the crawler when the container launches.
CMD [ "python3", "./****/launch_spiders.py" ]
My docker-compose.yaml
version: "3"
services:
splash:
image: scrapinghub/splash
restart: always
command: --maxrss 2048 --max-timeout 3600 --disable-lua-sandbox --verbosity 1
ports:
- "8050:8050"
launch_spiders:
restart: always
build: .
volumes:
- .:/usr/src/app
image: registry.digitalocean.com/****-registery/****-scraper
depends_on:
- splash
2
Answers
Problem solved!
The .env file with all my credentials was in the .dockerignore. It was then impossible to locate this .env when building the image.
Try installing binary packages of psycopg2-binary instead of psycopg2. Then you don’t need gcc and libpq-dev. Probably you have mixed versions of postgreSQL.