skip to Main Content

I’ve created a docker image using the following Dockerfile:

FROM ubuntu:18.04
ARG DEBIAN_FRONTEND=noninteractive

WORKDIR /usr/local/src

# Setting up general environment
RUN apt-get -y update 
    && apt-get install -y build-essential 
    && apt-get install -y wget 
    && apt-get install -y hmmer 
    && apt-get install -y git 
    && apt-get clean 
    && rm -rf /var/lib/apt/lists/*

## Installing miniconda
ENV CONDA_DIR /opt/conda
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && 
    /bin/bash ~/miniconda.sh -b -p /opt/conda
ENV PATH=$CONDA_DIR/bin:$PATH

# Installing NLRTracker
RUN git clone https://github.com/eliza-m/NLRexpress 
WORKDIR /usr/local/src/NLRexpress

# Setting up the conda environment and required variables 
RUN conda env create -f environment.yml && 
    conda init bash && 
    echo "conda activate nlrexpress" >> ~/.bashrc

ENV PATH /opt/conda/envs/nlrexpress/bin:/usr/local/src/NLRexpress:$PATH
ENV CONDA_DEFAULT_ENV nlrexpress

RUN wget https://nlrexpress.biochim.ro/datasets/models.tar.gz && 
    tar -xf models.tar.gz && 
    rm models.tar.gz

RUN echo "#!/bin/bash n python nlrexpress.py" > nlrexpress && 
    chmod +x nlrexpress

I avoided using the CMD argument and made an executable nlrexpress because I want to use this image for hmmsearch too.

The image builds fine, and when i tested docker run nlrexpress:latest nlrexpress I get the expected output:

Usage: nlrexpress.py [OPTIONS]
Try 'nlrexpress.py --help' for help.

Error: Missing option '--input'.

However, when I use the container with nextflow I get the following error: python: can't open file '/path/to/workDir/ea/72bd9e660d0ce79944d8bdde3dd024/nlrexpress.py': [Errno 2] No such file or directory

Here is the nextflow process:

process NLRexpress {
  tag "$sample_id"
  publishDir params.PlantDir
  maxForks 1
  container = 'dthorbur1990/nlrexpress:latest'
  executor = "local"

  input:
      tuple val(sample_id), path(peptides)

  output:
      path "*.short.output.txt" 

  script:
  """
  mkdir output
  nlrexpress \
        --input ../${peptides} \
        --outdir ./output \
        --module ${params.NE_Modules}

  mv output/*.short.output.txt ./
  """
}

How can i ensure files in the containers WORKDIR are available when mounting a container? I’ve tried setting ENV variables, but this doesn’t seem to work either. I thought because WORKDIR is set, that the image would always mount to the WORKDIR path and all the files would be available.

I’ve found I can just clone the repo into the nextflow working directory, but this isn’t an ideal workaround as I would also have to download the models for each process. The same issue goes for the models directory i downloaded into the container.

**Edit: Just adding that hmmsearch works absolutely fine with nextflow and the container.

2

Answers


  1. Chosen as BEST ANSWER

    I'm still getting to grips with how docker works, but I found a solution in case anyone else has the same porblems.

    First I tried to give the executable nlrexpress the full path to the python script:

    RUN echo "#!/bin/bash n python /usr/local/src/NLRexpress/nlrexpress.py" > nlrexpress && 
        chmod +x nlrexpress
    

    But this ended up executing the command and ignoring the input parameters that followed.

    Instead, I indicated which binary was needed via the shebang so the python script itself could be executed without needed to write python script.py.

    RUN sed  -i '1i #!/opt/conda/envs/nlrexpress/bin/python' nlrexpress.py && 
        chmod +x nlrexpress.py
    

  2. Creating your own wrapper script, like in your first example, is generally considered a cleaner and more flexible solution. Here’s one way using the continuumio/miniconda3 image:

    FROM continuumio/miniconda3
    
    RUN git clone https://github.com/eliza-m/NLRexpress /opt/NLRexpress
    WORKDIR /opt/NLRexpress
    
    RUN conda update conda -y
    
    RUN conda env create -f environment.yml 
        && conda clean --all -y
    
    RUN conda install -c bioconda hmmer 
        && conda clean --all -y
    
    ARG CONDA_ENV_NAME=nlrexpress
    RUN echo "conda activate ${CONDA_ENV_NAME}" >> ~/.bashrc
    ENV PATH "/opt/conda/envs/${CONDA_ENV_NAME}/bin:${PATH}"
    
    ARG NLREXPRESS=/usr/local/bin/nlrexpress
    RUN echo '#!/bin/bash' >> "${NLREXPRESS}" 
        && echo 'python /opt/NLRexpress/nlrexpress.py "$@"' >> "${NLREXPRESS}" 
        && chmod +x "${NLREXPRESS}"
    
    ARG SPLITFASTA=/usr/local/bin/splitFasta
    RUN echo '#!/bin/bash' >> "${SPLITFASTA}" 
        && echo 'python /opt/NLRexpress/splitFasta.py "$@"' >> "${SPLITFASTA}" 
        && chmod +x "${SPLITFASTA}"
    
    RUN wget https://nlrexpress.biochim.ro/datasets/models.tar.gz 
        && tar xf models.tar.gz 
        && rm models.tar.gz
    
    CMD ["bash"]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search