I am new to docker and have built a custom container to run my spider on my cloud server. My scraper is built using python 3.6, scrapy 1.6, selenium and using docker to run everything in one container. When starting the spider I have scrapy open_spider method that runs another python script in the directory that generated the urls for scrapy to crawl. The script saves the links in a text file however, I am getting PermissionError: [Errno 13] Permission denied: ‘tmp.
I have attempted to run chmod 777 and a+rw on the tmp folder so it will allow me to create the text file but I am still getting permission denied error. I have been researching this for days and cannot figure out how to fix this.
The OS on my laptop is ubuntu 18.04.
Below is the link to my docker file
Dockerfile
FROM scrapinghub/scrapinghub-stack-scrapy:1.6-py3
RUN apt-get -y --no-install-recommends install zip unzip jq libxml2 libxml2-dev
RUN printf "deb http://archive.debian.org/debian/ jessie mainndeb-src http://archive.debian.org/debian/ jessie mainndeb http://security.debian.org jessie/updates mainndeb-src http://security.debian.org jessie/updates main" > /etc/apt/sources.list
#============================================
# Google Chrome
#============================================
# can specify versions by CHROME_VERSION;
# e.g. google-chrome-stable=53.0.2785.101-1
# google-chrome-beta=53.0.2785.92-1
# google-chrome-unstable=54.0.2840.14-1
# latest (equivalent to google-chrome-stable)
# google-chrome-beta (pull latest beta)
#============================================
ARG CHROME_VERSION="google-chrome-stable"
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
&& echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
&& apt-get update -qqy
&& apt-get -qqy install
${CHROME_VERSION:-google-chrome-stable}
&& rm /etc/apt/sources.list.d/google-chrome.list
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/*
#============================================
# Chrome Webdriver
#============================================
# can specify versions by CHROME_DRIVER_VERSION
# Latest released version will be used by default
#============================================
ARG CHROME_DRIVER_VERSION
RUN CHROME_STRING=$(google-chrome --version)
&& CHROME_VERSION_STRING=$(echo "${CHROME_STRING}" | grep -oP "d+.d+.d+.d+")
&& CHROME_MAYOR_VERSION=$(echo "${CHROME_VERSION_STRING%%.*}")
&& wget --no-verbose -O /tmp/LATEST_RELEASE "https://chromedriver.storage.googleapis.com/LATEST_RELEASE_${CHROME_MAYOR_VERSION}"
&& CD_VERSION=$(cat "/tmp/LATEST_RELEASE")
&& rm /tmp/LATEST_RELEASE
&& if [ -z "$CHROME_DRIVER_VERSION" ];
then CHROME_DRIVER_VERSION="${CD_VERSION}";
fi
&& CD_VERSION=$(echo $CHROME_DRIVER_VERSION)
&& echo "Using chromedriver version: "$CD_VERSION
&& wget --no-verbose -O /tmp/chromedriver_linux64.zip https://chromedriver.storage.googleapis.com/$CD_VERSION/chromedriver_linux64.zip
&& rm -rf /opt/selenium/chromedriver
&& unzip /tmp/chromedriver_linux64.zip -d /opt/selenium
&& rm /tmp/chromedriver_linux64.zip
&& mv /opt/selenium/chromedriver /opt/selenium/chromedriver-$CD_VERSION
&& chmod 755 /opt/selenium/chromedriver-$CD_VERSION
&& sudo ln -fs /opt/selenium/chromedriver-$CD_VERSION /usr/bin/chromedriver
#============================================
# crawlera-headless-proxy
#============================================
RUN curl -L https://github.com/scrapinghub/crawlera-headless-proxy/releases/download/1.1.1/crawlera-headless-proxy-linux-amd64 -o /usr/local/bin/crawlera-headless-proxy
&& chmod +x /usr/local/bin/crawlera-headless-proxy
RUN chmod a+rw app/cars/spiders
RUN chmod a+rw app/cars/tmp
COPY ./start-crawl /usr/local/bin/start-crawl
ENV TERM xterm
ENV SCRAPY_SETTINGS_MODULE cars.settings
RUN pip install --upgrade pip
RUN mkdir -p /app
WORKDIR /app
COPY ./requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
RUN python setup.py install
RUN chmod a+rw app/cars/tmp
Heres the link to my setup.py file
# Automatically created by: shub deploy
from setuptools import setup, find_packages
setup(
name='cars',
version='1.0',
packages=find_packages(),
entry_points={'scrapy': ['settings = cars.settings']},
)
2
Answers
Maybe the ‘setup.py’ file from line:
needs to be executed with appropriate permissions?
Add the following to your Dockerfile:
Note: If
--disabled-login
also doesn’t work, use--disabled-password