skip to Main Content

I’m trying to wrap a scraping project in a Docker container to run it on a droplet. The spider scraps a website and then writes the data to a postgres database. The postgres database is already running and managed by Digitalocean.

When I run the command locally to test, everything is fine:

docker compose up

I can visualize the spider writing on the database.

Then, I use github action to build and push my docker image on a registry each time I push the code with the script:

name: CI

# 1
# Controls when the workflow will run.
on:
  # Triggers the workflow on push events but only for the master branch
  push:
    branches: [ master ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:
    inputs:
      version:
        description: 'Image version'
        required: true
#2
env:
  REGISTRY: "registry.digitalocean.com/*****-registery"
  IMAGE_NAME: "******-scraper"
  POSTGRES_USERNAME: ${{ secrets.POSTGRES_USERNAME }}
  POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
  POSTGRES_HOSTNAME: ${{ secrets.POSTGRES_HOSTNAME }}
  POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }}
  POSTGRES_DATABASE: ${{ secrets.POSTGRES_DATABASE }}
  SPLASH_URL: ${{ secrets.SPLASH_URL }}

#3
jobs:
  build-compose:
    name: Build docker-compose
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2

    - name: Insall doctl
      uses: digitalocean/action-doctl@v2
      with:
        token: ${{ secrets.DIGITALOCEAN_ACCESS_TOKEN }}

    - name: Login to DO Container Registry with short-lived creds
      run: doctl registry login --expiry-seconds 1200

    - name: Remove all old images
      run: if [ ! -z "$(doctl registry repository list | grep "****-scraper")" ]; then doctl registry repository delete-manifest ****-scraper $(doctl registry repository list-tags ****-scraper | grep -o "sha.*") --force; else echo "No repository"; fi

    - name: Build compose
      run: docker compose -f docker-compose.yaml up -d

    - name: Push to Digital Ocean registery
      run: docker compose push

  deploy:
    name: Deploy from registery to droplet
    runs-on: ubuntu-latest
    needs: build-compose

Then I ssh root@ipv4 manually to my droplet in order to install docker, docker compose and run the image from the registry with:

# Login to registry
docker login -u DO_TOKEN -p DO_TOKEN registry.digitalocean.com
# Stop running container
docker stop ****-scraper
# Remove old container
docker rm ****-scraper
# Run a new container from a new image
docker run -d --restart always --name ****-scraper registry.digitalocean.com/****-registery/****-scraper

As soon as the python script starts on the droplet I have the error:

psycopg2.OperationalError: could not connect to server: No such file
or directory Is the server running locally and accepting connections
on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

It seems like I’m doing something wrong and I can’t find how to fix this so far.
I would appreciate some help explanations.

Thanks,

My Dockerfile:

# As Scrapy runs on Python, I run the official Python 3 Docker image.
FROM python:3.9.7-slim

# Set the working directory to /usr/src/app.
WORKDIR /usr/src/app

# Install libpq-dev for psycopg2 python package
RUN apt-get update 
    && apt-get -y install libpq-dev gcc

# Copy the file from the local host to the filesystem of the container at the working directory.
COPY requirements.txt ./

# Install Scrapy specified in requirements.txt.
RUN pip3 install --no-cache-dir -r requirements.txt

# Copy the project source code from the local host to the filesystem of the container at the working directory.
COPY . .

# For Slash
EXPOSE 8050

# Run the crawler when the container launches.
CMD [ "python3", "./****/launch_spiders.py" ]

My docker-compose.yaml

version: "3"

services:
  splash:
    image: scrapinghub/splash
    restart: always
    command: --maxrss 2048 --max-timeout 3600 --disable-lua-sandbox --verbosity 1
    ports:
      - "8050:8050"
  launch_spiders:
    restart: always
    build: .
    volumes:
      - .:/usr/src/app
    image: registry.digitalocean.com/****-registery/****-scraper
    depends_on:
      - splash

2

Answers


  1. Chosen as BEST ANSWER

    Problem solved!

    The .env file with all my credentials was in the .dockerignore. It was then impossible to locate this .env when building the image.


  2. Try installing binary packages of psycopg2-binary instead of psycopg2. Then you don’t need gcc and libpq-dev. Probably you have mixed versions of postgreSQL.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search