Ubuntu - nextflow reading bash script from bin folder but not Rscript

AmmarSabirCheema
May 10, 2024
70 views
1 vote
2 Answers

I am developing a nextflow based pipeline, I have two processes for downloading files which are given below,

process templateExample{
publishDir "data_analysis_files", mode:'copy'     

output:
path "*_gex.csv" , emit: count_files        

script:
'''
"download_files.sh"
'''   

}



process read_count_p{

publishDir "results",mode:'copy'
input:
path count_files


output:
path "result.txt"

"""
Rscript read_count.R ${count_files}
"""
 }


 workflow {
 
 templateExample()
 read_count_p(templateExample.out.count_files)
 
   }

The script download_files.sh and read_count.R are present in the bin folder but the problem is that when I execute nextflow it founds and executes the bash script named download_files.sh from bin folder but not the R script named read_count.R. The bash script and R script are given below. The error is also given below,

#!/bin/bash

# Define the URLs of the files to download
urls=(
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832735/suppl/GSM3832735_wt_naive_gex.csv.gz"
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832736/suppl/GSM3832736_wt_naive_adt.csv.gz"
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832737/suppl/GSM3832737_wt_tumor_gex.csv.gz"
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832738/suppl/GSM3832738_wt_tumor_adt.csv.gz" 
    "https://zenodo.org/records/5511975/files/negative_cDC1_relative_signatures.csv?download=1"
    "https://zenodo.org/records/5511975/files/positive_cDC1_relative_signatures.csv?download=1"
    "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP-internal.R"
    "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP_score.R"
    )


# Download each file using wget
for url in "${urls[@]}"; do
    wget "$url"
done

# Unzip each downloaded file using gunzip
for file in *.gz;do
    gunzip "$file"
done

The R script is

#!/user/bin/R
args <- commandArgs(trailingOnly = TRUE)
print(args[0])
my_vec <- c(args[0],args[1],args[0],class(args),args[2])
write.table(my_vec,"result1.txt")

And the error is given below,

acheema@acri-AS-1124US-TNRP:~$ nextflow run single_cell.nf
N E X T F L O W  ~  version 23.10.1
    Launching `single_cell.nf` [soggy_sanger] DSL2 - revision: f55ed68615
    executor >  local (2)
[8d/2e0586] process > templateExample [100%] 1 of 1 ✔
[6d/17dc6a] process > read_count_p    [100%] 1 of 1, failed: 1 ✘
    ERROR ~ Error executing process > 'read_count_p'

Caused by:
     Process `read_count_p` terminated with an error exit status (2)

Command executed:

     Rscript read_count.R GSM3832735_wt_naive_gex.csv GSM3832737_wt_tumor_gex.csv

Command exit status:
      2

Command output:
     Fatal error: cannot open file 'read_count.R': No such file or directory

Command error:
      Fatal error: cannot open file 'read_count.R': No such file or directory

Work dir:
      /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

-- Check '.nextflow.log' file for details

The .nextflow.log is given below,

acheema@acri-AS-1124US-TNRP:~$ cat .nextflow.log
May-08 15:18:54.580 [main] DEBUG nextflow.cli.Launcher - $> nextflow run single_cell.nf
May-08 15:18:54.712 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 23.10.1
May-08 15:18:54.734 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/acheema/.nextflow/plugins; core-plugins: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]
May-08 15:18:54.743 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
May-08 15:18:54.744 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
May-08 15:18:54.747 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
May-08 15:18:54.757 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
May-08 15:18:54.817 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
May-08 15:18:54.832 [main] INFO  nextflow.cli.CmdRun - Launching `single_cell.nf` [soggy_sanger] DSL2 - revision: f55ed68615
May-08 15:18:54.833 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
May-08 15:18:54.833 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
May-08 15:18:54.840 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/acheema/.nextflow/secrets/store.json
May-08 15:18:54.846 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@783ec989] - activable => nextflow.secret.LocalSecretsProvider@783ec989
May-08 15:18:54.899 [main] DEBUG nextflow.Session - Session UUID: 34564b50-df93-4baa-8861-cba8231186f4
May-08 15:18:54.900 [main] DEBUG nextflow.Session - Run name: soggy_sanger
May-08 15:18:54.901 [main] DEBUG nextflow.Session - Executor pool size: 128
May-08 15:18:54.908 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
May-08 15:18:54.911 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=384; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-08 15:18:54.938 [main] DEBUG nextflow.cli.CmdRun -
  Version: 23.10.1 build 5891
  Created: 12-01-2024 22:01 UTC (18:01 ADT)
  System: Linux 5.4.0-150-generic
  Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 11.0.19+7-post-Ubuntu-0ubuntu118.04.1
  Encoding: UTF-8 (ANSI_X3.4-1968)
  Process: 18550@acri-AS-1124US-TNRP [127.0.1.1]
  CPUs: 128 - Mem: 1007.8 GB (709.6 GB) - Swap: 2 GB (2 GB)
May-08 15:18:54.958 [main] DEBUG nextflow.Session - Work-dir: /home/acheema/work [ext2/ext3]
May-08 15:18:55.011 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
May-08 15:18:55.023 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
May-08 15:18:55.057 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
May-08 15:18:55.066 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 129; maxThreads: 1000
May-08 15:18:55.114 [main] DEBUG nextflow.Session - Session start
May-08 15:18:55.644 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
May-08 15:18:55.705 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
May-08 15:18:55.705 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
May-08 15:18:55.710 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
May-08 15:18:55.714 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=128; memory=1007.8 GB; capacity=128; pollInterval=100ms; dumpInterval=5m
May-08 15:18:55.716 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: local)
May-08 15:18:55.821 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
May-08 15:18:55.821 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
May-08 15:18:55.828 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: templateExample, read_count_p
May-08 15:18:55.828 [main] DEBUG nextflow.Session - Igniting dataflow network (2)
May-08 15:18:55.829 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > templateExample
May-08 15:18:55.830 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > read_count_p
May-08 15:18:55.831 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
  Script_1e152ad49ae18340: /home/acheema/single_cell.nf
May-08 15:18:55.831 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
May-08 15:18:55.831 [main] DEBUG nextflow.Session - Session await
May-08 15:18:55.991 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-08 15:18:55.995 [Task submitter] INFO  nextflow.Session - [8d/2e0586] Submitted process > templateExample
May-08 15:19:07.473 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: templateExample; status: COMPLETED; exit: 0; error: -; workDir: /home/acheema/work/8d/2e0586013131bee894e6322a38edf7]
May-08 15:19:07.504 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'PublishDir' minSize=10; maxSize=384; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-08 15:19:07.537 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-08 15:19:07.538 [Task submitter] INFO  nextflow.Session - [6d/17dc6a] Submitted process > read_count_p
May-08 15:19:07.610 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: read_count_p; status: COMPLETED; exit: 2; error: -; workDir: /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428]
May-08 15:19:07.618 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=read_count_p; work-dir=/home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428
  error [nextflow.exception.ProcessFailedException]: Process `read_count_p` terminated with an error exit status (2)
May-08 15:19:07.632 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'read_count_p'

Caused by:
  Process `read_count_p` terminated with an error exit status (2)

Command executed:

  Rscript read_count.R GSM3832735_wt_naive_gex.csv GSM3832737_wt_tumor_gex.csv

Command exit status:
  2

Command output:
  Fatal error: cannot open file 'read_count.R': No such file or directory

Command error:
  Fatal error: cannot open file 'read_count.R': No such file or directory

Work dir:
  /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
May-08 15:19:07.635 [main] DEBUG nextflow.Session - Session await > all processes finished
May-08 15:19:07.638 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `read_count_p` terminated with an error exit status (2)
May-08 15:19:07.654 [main] DEBUG nextflow.Session - Session await > all barriers passed
May-08 15:19:07.655 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop
May-08 15:19:07.667 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=1; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=11.4s; failedDuration=41ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ]
May-08 15:19:07.856 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
May-08 15:19:07.879 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

But when I give the absolute path to the R script then it works fine.

script:
"""
Rscript /home/acheema/bin/read_count.R ${count_files}
"""

Now it works fine as given below,

acheema@acri-AS-1124US-TNRP:~$ nextflow run single_cell.nf
N E X T F L O W  ~  version 23.10.1
Launching `single_cell.nf` [astonishing_lorenz] DSL2 - revision: f279637f1a
executor >  local (2)
[26/99ed30] process > templateExample [100%] 1 of 1 ✔
[35/db989a] process > read_count_p    [100%] 1 of 1 ✔

Is there a way that R script can be found and read from the bin folder? I have tried the solutions suggested here but it did not work. Is there a solution?

Answers

This might have something to do with file permissions, but it’s hard to say since the first process works.

What I do is read the R script into a value channel, and read it in like any other script. The benefit is also you can add a check if file exists function that will throw an error before the pipeline starts, rather than half way through if the R script is missing.

Also, I would just paste the download_files.sh into the script box of the process. It’s what nextflow was designed for. Same with the R script, but it would be more annoying to change, so I’ll leave it.

process templateExample {
  publishDir "data_analysis_files", mode:'copy'     

  output:
  path "*_gex.csv" , emit: count_files        
  
  script:
  """
  #!/bin/bash
  
  # Define the URLs of the files to download
  urls=(
      "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832735/suppl/GSM3832735_wt_naive_gex.csv.gz"
      "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832736/suppl/GSM3832736_wt_naive_adt.csv.gz"
      "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832737/suppl/GSM3832737_wt_tumor_gex.csv.gz"
      "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832738/suppl/GSM3832738_wt_tumor_adt.csv.gz" 
      "https://zenodo.org/records/5511975/files/negative_cDC1_relative_signatures.csv?download=1"
      "https://zenodo.org/records/5511975/files/positive_cDC1_relative_signatures.csv?download=1"
      "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP-internal.R"
      "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP_score.R"
      )
  
  
  # Download each file using wget
  for url in "${urls[@]}"; do
      wget "$url"
  done
  
  # Unzip each downloaded file using gunzip
  for file in *.gz;do
      gunzip "$file"
  done
  """
}


process read_count_p {
  publishDir "results",mode:'copy'

  input:
  path count_files
  path read_counts_rscript

  output:
  path "result.txt"

  """
  Rscript ${read_counts_rscript} ${count_files}
  """
}


 workflow {
   templateExample()
   read_count_p(templateExample.out.count_files, read_counts_rscript )
 }

And add the following channel to your channel creation script block

Channel
   .fromPath(params.read_counts_rscript)
   .ifEmpty { error "No merging Rscript supplied: ${params.read_counts_rscript}" }
   .set { read_counts_rscript }

EDIT: Didn’t update the workflow declaration with the new Rscript channel.

- niklas
- May 10, 2024 at 3:33 pm
- 0 votes
0
Your Rscript needs to be in bin of the pipeline directory (which in your case seems to be ~) and should be executable (i.e. chmod +x). Are you certain that your R binary is in /user/bin (not /usr)?
You could also also change the shebang to #!/usr/bin/env Rscript and do
```
"""
read_count.R ${count_files}
"""
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Ubuntu – nextflow reading bash script from bin folder but not Rscript

Answers