Helli, I have to build a Docker image for the following bioinformatics tool: https://github.com/CAMI-challenge/CAMISIM. Their dockerfile works but takes a long time to build and I would like to build my own, slightly differently, to learn. I face issues: there are several python script that I should be able to choose to run, not only a main
. If I add one script in particular as an ENTRYPOINT
then the behavior isn’t exactly what I shoud have.
The Dockerfile:
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
USER root
#COPY ./install_docker.sh ./
#RUN chmod +x ./install_docker.sh && sh ./install_docker.sh
RUN apt-get update &&
apt install -y git python3-pip libxml-simple-perl libncursesw5 &&
git clone https://github.com/CAMI-challenge/CAMISIM.git &&
pip3 install numpy ete3 biom-format biopython matplotlib joblib scikit-learn
ENTRYPOINT ["python3"]
ENV PATH="/CAMISIM/:${PATH}"
This yields :
sudo docker run camisim:latest metagenomesimulation.py --help
python3: can't open file 'metagenomesimulation.py': [Errno 2] No such file or directory
Adding that script as an ENTRYPOINT
after python3
allows me to use it with 2 drawbacks: I cannot use another script (I could build a second docker image but that would be a bad solution), and it outputs:
ERROR: 0
usage: python metagenomesimulation.py configuration_file_path
#######################################
# MetagenomeSimulationPipeline #
#######################################
Pipeline for the simulation of a metagenome
optional arguments:
-h, --help show this help message and exit
-silent, --silent Hide unimportant Progress Messages.
-debug, --debug_mode more information, also temporary data will not be deleted
-log LOGFILE, --logfile LOGFILE
output will also be written to this log file
optional config arguments:
-seed SEED seed for random number generators
-s {0,1,2}, --phase {0,1,2}
available options: 0,1,2. Default: 0
0 -> Full run,
1 -> Only Comunity creation,
2 -> Only Readsimulator
-id DATA_SET_ID, --data_set_id DATA_SET_ID
id of the dataset, part of prefix of read/contig sequence ids
-p MAX_PROCESSORS, --max_processors MAX_PROCESSORS
number of available processors
required:
config_file path to the configuration file
You can see there is an error that should’nt be there, it actually does not use the help
flag. The original Dockerfile is:
FROM ubuntu:20.04
RUN apt update
RUN apt install -y python3 python3-pip perl libncursesw5
RUN perl -MCPAN -e 'install XML::Simple'
ADD requirements.txt /requirements.txt
RUN cat requirements.txt | xargs -n 1 pip install
ADD *.py /usr/local/bin/
ADD scripts /usr/local/bin/scripts
ADD tools /usr/local/bin/tools
ADD defaults /usr/local/bin/defaults
WORKDIR /usr/local/bin
ENTRYPOINT ["python3"]
It works but shows the error as above, so not so much. Said error is not present when using the tool outside of docker. Last time I made a Docker image I just pulled the git repo and added the main .sh
script as an ENTRYPOINT
and everything worked despite being more complex (see https://github.com/Louis-MG/Metadbgwas).
Why would I need ADD
and moving everything ? I added the git folder to the path, why can’t I find the scripts ? How is it different from the Metadbgwas image ?
2
Answers
In the end very little was recquired and the original
Dockerfile
was correct, the same error is displayed anyway, that is due to the script itself. What was missing was a link to the interpreter, so I could remove theENTRYPOINT
and actually interpret the script instead of havingpython
look for it in its own path. TheDockerfile
:Trying
WORKDIR
as suggested instead of thePATH
yielded an error.In your first setup, you start in the image root directory
/
and rungit clone
to check out the repository into/CAMISIM
. You never change the current directory, though, so when you try to runpython3 metagenomesimulation.py --help
it’s looking in/
and not/CAMISIM
, hence the "not found" error.You can fix this just by changing the current directory. At any point after you check out the repository, run
You should also delete the
ENTRYPOINT
line. For each of the scripts you could run as a top-level entry point, check two things:ls -l metagenomesimulation.py
are therex
in the permission listing? If not, on the host system, runchmod +x metagenomesimulation.py
and commit to source control. (Or you couldRUN chmod ...
in the Dockerfile if you really can’t change the repository.)If both of these things are true, then you can just run
./metagenomesimulation.py
without explicitly sayingpython3
; since you add the directory to$PATH
as well, you can probably run it without specifying the./...
file location.(Probably deleting the
ENTRYPOINT
line on its own is enough, given thatENV PATH
setting, but your script still might be confused by starting up in the wrong directory.)The long "help" output just suggests to me that the script is expecting a configuration file name as a parameter and you haven’t provided it, or else you’ve repeated the script name in both the entrypoint and command parts of the container command string.