skip to Main Content

Helli, I have to build a Docker image for the following bioinformatics tool: https://github.com/CAMI-challenge/CAMISIM. Their dockerfile works but takes a long time to build and I would like to build my own, slightly differently, to learn. I face issues: there are several python script that I should be able to choose to run, not only a main. If I add one script in particular as an ENTRYPOINT then the behavior isn’t exactly what I shoud have.

The Dockerfile:

FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
USER root
#COPY ./install_docker.sh ./
#RUN chmod +x ./install_docker.sh && sh ./install_docker.sh
RUN apt-get update && 
    apt install -y git python3-pip libxml-simple-perl libncursesw5 && 
    git clone https://github.com/CAMI-challenge/CAMISIM.git && 
    pip3 install numpy ete3 biom-format biopython matplotlib joblib scikit-learn 
ENTRYPOINT ["python3"]
ENV PATH="/CAMISIM/:${PATH}"

This yields :

sudo docker run camisim:latest metagenomesimulation.py --help
python3: can't open file 'metagenomesimulation.py': [Errno 2] No such file or directory

Adding that script as an ENTRYPOINT after python3 allows me to use it with 2 drawbacks: I cannot use another script (I could build a second docker image but that would be a bad solution), and it outputs:

ERROR: 0
usage: python metagenomesimulation.py configuration_file_path

    #######################################
    #    MetagenomeSimulationPipeline     #
    #######################################

    Pipeline for the simulation of a metagenome

optional arguments:
  -h, --help            show this help message and exit
  -silent, --silent     Hide unimportant Progress Messages.
  -debug, --debug_mode  more information, also temporary data will not be deleted
  -log LOGFILE, --logfile LOGFILE
                        output will also be written to this log file

optional config arguments:
  -seed SEED            seed for random number generators
  -s {0,1,2}, --phase {0,1,2}
                        available options: 0,1,2. Default: 0
                        0 -> Full run,
                        1 -> Only Comunity creation,
                        2 -> Only Readsimulator
  -id DATA_SET_ID, --data_set_id DATA_SET_ID
                        id of the dataset, part of prefix of read/contig sequence ids
  -p MAX_PROCESSORS, --max_processors MAX_PROCESSORS
                        number of available processors

required:
  config_file           path to the configuration file

You can see there is an error that should’nt be there, it actually does not use the help flag. The original Dockerfile is:

FROM ubuntu:20.04

RUN apt update
RUN apt install -y python3 python3-pip perl libncursesw5
RUN perl -MCPAN -e 'install XML::Simple'
ADD requirements.txt /requirements.txt
RUN cat requirements.txt | xargs -n 1 pip install
ADD *.py /usr/local/bin/
ADD scripts /usr/local/bin/scripts
ADD tools /usr/local/bin/tools
ADD defaults /usr/local/bin/defaults
WORKDIR /usr/local/bin
ENTRYPOINT ["python3"]

It works but shows the error as above, so not so much. Said error is not present when using the tool outside of docker. Last time I made a Docker image I just pulled the git repo and added the main .sh script as an ENTRYPOINT and everything worked despite being more complex (see https://github.com/Louis-MG/Metadbgwas).

Why would I need ADD and moving everything ? I added the git folder to the path, why can’t I find the scripts ? How is it different from the Metadbgwas image ?

2

Answers


  1. Chosen as BEST ANSWER

    In the end very little was recquired and the original Dockerfile was correct, the same error is displayed anyway, that is due to the script itself. What was missing was a link to the interpreter, so I could remove the ENTRYPOINT and actually interpret the script instead of having python look for it in its own path. The Dockerfile:

    FROM ubuntu:20.04
    ENV DEBIAN_FRONTEND=noninteractive
    USER root
    
    RUN ln -s /usr/bin/python3 /usr/bin/python
    
    RUN apt-get update && 
            apt install -y git python3-pip libxml-simple-perl libncursesw5 && 
            git clone https://github.com/CAMI-challenge/CAMISIM.git && 
            pip3 install numpy ete3 biom-format biopython matplotlib joblib scikit-learn
    ENV PATH="/CAMISIM:${PATH}"
    

    Trying WORKDIR as suggested instead of the PATH yielded an error.


  2. In your first setup, you start in the image root directory / and run git clone to check out the repository into /CAMISIM. You never change the current directory, though, so when you try to run python3 metagenomesimulation.py --help it’s looking in / and not /CAMISIM, hence the "not found" error.

    You can fix this just by changing the current directory. At any point after you check out the repository, run

    WORKDIR /CAMISIM
    

    You should also delete the ENTRYPOINT line. For each of the scripts you could run as a top-level entry point, check two things:

    1. Is it executable; if you ls -l metagenomesimulation.py are there x in the permission listing? If not, on the host system, run chmod +x metagenomesimulation.py and commit to source control. (Or you could RUN chmod ... in the Dockerfile if you really can’t change the repository.)
    2. Does it have a "shebang" line? The very first line of the script should be
      #!/usr/bin/env python3
      

    If both of these things are true, then you can just run ./metagenomesimulation.py without explicitly saying python3; since you add the directory to $PATH as well, you can probably run it without specifying the ./... file location.

    (Probably deleting the ENTRYPOINT line on its own is enough, given that ENV PATH setting, but your script still might be confused by starting up in the wrong directory.)

    The long "help" output just suggests to me that the script is expecting a configuration file name as a parameter and you haven’t provided it, or else you’ve repeated the script name in both the entrypoint and command parts of the container command string.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search