skip to Main Content

I am developing a nextflow based pipeline, I have two processes for downloading files which are given below,

process templateExample{
publishDir "data_analysis_files", mode:'copy'     

output:
path "*_gex.csv" , emit: count_files        

script:
'''
"download_files.sh"
'''   

}



process read_count_p{

publishDir "results",mode:'copy'
input:
path count_files


output:
path "result.txt"

"""
Rscript read_count.R ${count_files}
"""
 }


 workflow {
 
 templateExample()
 read_count_p(templateExample.out.count_files)
 
   }

The script download_files.sh and read_count.R are present in the bin folder but the problem is that when I execute nextflow it founds and executes the bash script named download_files.sh from bin folder but not the R script named read_count.R. The bash script and R script are given below. The error is also given below,

#!/bin/bash

# Define the URLs of the files to download
urls=(
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832735/suppl/GSM3832735_wt_naive_gex.csv.gz"
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832736/suppl/GSM3832736_wt_naive_adt.csv.gz"
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832737/suppl/GSM3832737_wt_tumor_gex.csv.gz"
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832738/suppl/GSM3832738_wt_tumor_adt.csv.gz" 
    "https://zenodo.org/records/5511975/files/negative_cDC1_relative_signatures.csv?download=1"
    "https://zenodo.org/records/5511975/files/positive_cDC1_relative_signatures.csv?download=1"
    "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP-internal.R"
    "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP_score.R"
    )


# Download each file using wget
for url in "${urls[@]}"; do
    wget "$url"
done

# Unzip each downloaded file using gunzip
for file in *.gz;do
    gunzip "$file"
done

The R script is

#!/user/bin/R
args <- commandArgs(trailingOnly = TRUE)
print(args[0])
my_vec <- c(args[0],args[1],args[0],class(args),args[2])
write.table(my_vec,"result1.txt")

And the error is given below,

acheema@acri-AS-1124US-TNRP:~$ nextflow run single_cell.nf
N E X T F L O W  ~  version 23.10.1
    Launching `single_cell.nf` [soggy_sanger] DSL2 - revision: f55ed68615
    executor >  local (2)
[8d/2e0586] process > templateExample [100%] 1 of 1 ✔
[6d/17dc6a] process > read_count_p    [100%] 1 of 1, failed: 1 ✘
    ERROR ~ Error executing process > 'read_count_p'

Caused by:
     Process `read_count_p` terminated with an error exit status (2)

Command executed:

     Rscript read_count.R GSM3832735_wt_naive_gex.csv GSM3832737_wt_tumor_gex.csv

Command exit status:
      2

Command output:
     Fatal error: cannot open file 'read_count.R': No such file or directory

Command error:
      Fatal error: cannot open file 'read_count.R': No such file or directory

Work dir:
      /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

-- Check '.nextflow.log' file for details

The .nextflow.log is given below,

acheema@acri-AS-1124US-TNRP:~$ cat .nextflow.log
May-08 15:18:54.580 [main] DEBUG nextflow.cli.Launcher - $> nextflow run single_cell.nf
May-08 15:18:54.712 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 23.10.1
May-08 15:18:54.734 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/acheema/.nextflow/plugins; core-plugins: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]
May-08 15:18:54.743 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
May-08 15:18:54.744 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
May-08 15:18:54.747 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
May-08 15:18:54.757 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
May-08 15:18:54.817 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
May-08 15:18:54.832 [main] INFO  nextflow.cli.CmdRun - Launching `single_cell.nf` [soggy_sanger] DSL2 - revision: f55ed68615
May-08 15:18:54.833 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
May-08 15:18:54.833 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
May-08 15:18:54.840 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/acheema/.nextflow/secrets/store.json
May-08 15:18:54.846 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@783ec989] - activable => nextflow.secret.LocalSecretsProvider@783ec989
May-08 15:18:54.899 [main] DEBUG nextflow.Session - Session UUID: 34564b50-df93-4baa-8861-cba8231186f4
May-08 15:18:54.900 [main] DEBUG nextflow.Session - Run name: soggy_sanger
May-08 15:18:54.901 [main] DEBUG nextflow.Session - Executor pool size: 128
May-08 15:18:54.908 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
May-08 15:18:54.911 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=384; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-08 15:18:54.938 [main] DEBUG nextflow.cli.CmdRun -
  Version: 23.10.1 build 5891
  Created: 12-01-2024 22:01 UTC (18:01 ADT)
  System: Linux 5.4.0-150-generic
  Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 11.0.19+7-post-Ubuntu-0ubuntu118.04.1
  Encoding: UTF-8 (ANSI_X3.4-1968)
  Process: 18550@acri-AS-1124US-TNRP [127.0.1.1]
  CPUs: 128 - Mem: 1007.8 GB (709.6 GB) - Swap: 2 GB (2 GB)
May-08 15:18:54.958 [main] DEBUG nextflow.Session - Work-dir: /home/acheema/work [ext2/ext3]
May-08 15:18:55.011 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
May-08 15:18:55.023 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
May-08 15:18:55.057 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
May-08 15:18:55.066 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 129; maxThreads: 1000
May-08 15:18:55.114 [main] DEBUG nextflow.Session - Session start
May-08 15:18:55.644 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
May-08 15:18:55.705 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
May-08 15:18:55.705 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
May-08 15:18:55.710 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
May-08 15:18:55.714 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=128; memory=1007.8 GB; capacity=128; pollInterval=100ms; dumpInterval=5m
May-08 15:18:55.716 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: local)
May-08 15:18:55.821 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
May-08 15:18:55.821 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
May-08 15:18:55.828 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: templateExample, read_count_p
May-08 15:18:55.828 [main] DEBUG nextflow.Session - Igniting dataflow network (2)
May-08 15:18:55.829 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > templateExample
May-08 15:18:55.830 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > read_count_p
May-08 15:18:55.831 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
  Script_1e152ad49ae18340: /home/acheema/single_cell.nf
May-08 15:18:55.831 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
May-08 15:18:55.831 [main] DEBUG nextflow.Session - Session await
May-08 15:18:55.991 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-08 15:18:55.995 [Task submitter] INFO  nextflow.Session - [8d/2e0586] Submitted process > templateExample
May-08 15:19:07.473 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: templateExample; status: COMPLETED; exit: 0; error: -; workDir: /home/acheema/work/8d/2e0586013131bee894e6322a38edf7]
May-08 15:19:07.504 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'PublishDir' minSize=10; maxSize=384; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-08 15:19:07.537 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-08 15:19:07.538 [Task submitter] INFO  nextflow.Session - [6d/17dc6a] Submitted process > read_count_p
May-08 15:19:07.610 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: read_count_p; status: COMPLETED; exit: 2; error: -; workDir: /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428]
May-08 15:19:07.618 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=read_count_p; work-dir=/home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428
  error [nextflow.exception.ProcessFailedException]: Process `read_count_p` terminated with an error exit status (2)
May-08 15:19:07.632 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'read_count_p'

Caused by:
  Process `read_count_p` terminated with an error exit status (2)

Command executed:

  Rscript read_count.R GSM3832735_wt_naive_gex.csv GSM3832737_wt_tumor_gex.csv

Command exit status:
  2

Command output:
  Fatal error: cannot open file 'read_count.R': No such file or directory

Command error:
  Fatal error: cannot open file 'read_count.R': No such file or directory

Work dir:
  /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
May-08 15:19:07.635 [main] DEBUG nextflow.Session - Session await > all processes finished
May-08 15:19:07.638 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `read_count_p` terminated with an error exit status (2)
May-08 15:19:07.654 [main] DEBUG nextflow.Session - Session await > all barriers passed
May-08 15:19:07.655 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop
May-08 15:19:07.667 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=1; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=11.4s; failedDuration=41ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ]
May-08 15:19:07.856 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
May-08 15:19:07.879 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

But when I give the absolute path to the R script then it works fine.

script:
"""
Rscript /home/acheema/bin/read_count.R ${count_files}
"""

Now it works fine as given below,

acheema@acri-AS-1124US-TNRP:~$ nextflow run single_cell.nf
N E X T F L O W  ~  version 23.10.1
Launching `single_cell.nf` [astonishing_lorenz] DSL2 - revision: f279637f1a
executor >  local (2)
[26/99ed30] process > templateExample [100%] 1 of 1 ✔
[35/db989a] process > read_count_p    [100%] 1 of 1 ✔

Is there a way that R script can be found and read from the bin folder? I have tried the solutions suggested here but it did not work. Is there a solution?

2

Answers


  1. This might have something to do with file permissions, but it’s hard to say since the first process works.

    What I do is read the R script into a value channel, and read it in like any other script. The benefit is also you can add a check if file exists function that will throw an error before the pipeline starts, rather than half way through if the R script is missing.

    Also, I would just paste the download_files.sh into the script box of the process. It’s what nextflow was designed for. Same with the R script, but it would be more annoying to change, so I’ll leave it.

    process templateExample {
      publishDir "data_analysis_files", mode:'copy'     
    
      output:
      path "*_gex.csv" , emit: count_files        
      
      script:
      """
      #!/bin/bash
      
      # Define the URLs of the files to download
      urls=(
          "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832735/suppl/GSM3832735_wt_naive_gex.csv.gz"
          "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832736/suppl/GSM3832736_wt_naive_adt.csv.gz"
          "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832737/suppl/GSM3832737_wt_tumor_gex.csv.gz"
          "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832738/suppl/GSM3832738_wt_tumor_adt.csv.gz" 
          "https://zenodo.org/records/5511975/files/negative_cDC1_relative_signatures.csv?download=1"
          "https://zenodo.org/records/5511975/files/positive_cDC1_relative_signatures.csv?download=1"
          "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP-internal.R"
          "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP_score.R"
          )
      
      
      # Download each file using wget
      for url in "${urls[@]}"; do
          wget "$url"
      done
      
      # Unzip each downloaded file using gunzip
      for file in *.gz;do
          gunzip "$file"
      done
      """
    }
    
    
    process read_count_p {
      publishDir "results",mode:'copy'
    
      input:
      path count_files
      path read_counts_rscript
    
      output:
      path "result.txt"
    
      """
      Rscript ${read_counts_rscript} ${count_files}
      """
    }
    
    
     workflow {
       templateExample()
       read_count_p(templateExample.out.count_files, read_counts_rscript )
     }
    

    And add the following channel to your channel creation script block

    Channel
       .fromPath(params.read_counts_rscript)
       .ifEmpty { error "No merging Rscript supplied: ${params.read_counts_rscript}" }
       .set { read_counts_rscript }
    

    EDIT: Didn’t update the workflow declaration with the new Rscript channel.

    Login or Signup to reply.
  2. Your Rscript needs to be in bin of the pipeline directory (which in your case seems to be ~) and should be executable (i.e. chmod +x). Are you certain that your R binary is in /user/bin (not /usr)?
    You could also also change the shebang to #!/usr/bin/env Rscript and do

    """
    read_count.R ${count_files}
    """
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search