I am developing a nextflow based pipeline, I have two processes for downloading files which are given below,
process templateExample{
publishDir "data_analysis_files", mode:'copy'
output:
path "*_gex.csv" , emit: count_files
script:
'''
"download_files.sh"
'''
}
process read_count_p{
publishDir "results",mode:'copy'
input:
path count_files
output:
path "result.txt"
"""
Rscript read_count.R ${count_files}
"""
}
workflow {
templateExample()
read_count_p(templateExample.out.count_files)
}
The script download_files.sh
and read_count.R
are present in the bin folder but the problem is that when I execute nextflow it founds and executes the bash script named download_files.sh
from bin folder but not the R script named read_count.R. The bash script and R script are given below. The error is also given below,
#!/bin/bash
# Define the URLs of the files to download
urls=(
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832735/suppl/GSM3832735_wt_naive_gex.csv.gz"
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832736/suppl/GSM3832736_wt_naive_adt.csv.gz"
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832737/suppl/GSM3832737_wt_tumor_gex.csv.gz"
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832738/suppl/GSM3832738_wt_tumor_adt.csv.gz"
"https://zenodo.org/records/5511975/files/negative_cDC1_relative_signatures.csv?download=1"
"https://zenodo.org/records/5511975/files/positive_cDC1_relative_signatures.csv?download=1"
"https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP-internal.R"
"https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP_score.R"
)
# Download each file using wget
for url in "${urls[@]}"; do
wget "$url"
done
# Unzip each downloaded file using gunzip
for file in *.gz;do
gunzip "$file"
done
The R script is
#!/user/bin/R
args <- commandArgs(trailingOnly = TRUE)
print(args[0])
my_vec <- c(args[0],args[1],args[0],class(args),args[2])
write.table(my_vec,"result1.txt")
And the error is given below,
acheema@acri-AS-1124US-TNRP:~$ nextflow run single_cell.nf
N E X T F L O W ~ version 23.10.1
Launching `single_cell.nf` [soggy_sanger] DSL2 - revision: f55ed68615
executor > local (2)
[8d/2e0586] process > templateExample [100%] 1 of 1 ✔
[6d/17dc6a] process > read_count_p [100%] 1 of 1, failed: 1 ✘
ERROR ~ Error executing process > 'read_count_p'
Caused by:
Process `read_count_p` terminated with an error exit status (2)
Command executed:
Rscript read_count.R GSM3832735_wt_naive_gex.csv GSM3832737_wt_tumor_gex.csv
Command exit status:
2
Command output:
Fatal error: cannot open file 'read_count.R': No such file or directory
Command error:
Fatal error: cannot open file 'read_count.R': No such file or directory
Work dir:
/home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
-- Check '.nextflow.log' file for details
The .nextflow.log
is given below,
acheema@acri-AS-1124US-TNRP:~$ cat .nextflow.log
May-08 15:18:54.580 [main] DEBUG nextflow.cli.Launcher - $> nextflow run single_cell.nf
May-08 15:18:54.712 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 23.10.1
May-08 15:18:54.734 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/acheema/.nextflow/plugins; core-plugins: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]
May-08 15:18:54.743 [main] INFO o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
May-08 15:18:54.744 [main] INFO o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
May-08 15:18:54.747 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
May-08 15:18:54.757 [main] INFO org.pf4j.AbstractPluginManager - No plugins
May-08 15:18:54.817 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
May-08 15:18:54.832 [main] INFO nextflow.cli.CmdRun - Launching `single_cell.nf` [soggy_sanger] DSL2 - revision: f55ed68615
May-08 15:18:54.833 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
May-08 15:18:54.833 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
May-08 15:18:54.840 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/acheema/.nextflow/secrets/store.json
May-08 15:18:54.846 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@783ec989] - activable => nextflow.secret.LocalSecretsProvider@783ec989
May-08 15:18:54.899 [main] DEBUG nextflow.Session - Session UUID: 34564b50-df93-4baa-8861-cba8231186f4
May-08 15:18:54.900 [main] DEBUG nextflow.Session - Run name: soggy_sanger
May-08 15:18:54.901 [main] DEBUG nextflow.Session - Executor pool size: 128
May-08 15:18:54.908 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
May-08 15:18:54.911 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=384; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-08 15:18:54.938 [main] DEBUG nextflow.cli.CmdRun -
Version: 23.10.1 build 5891
Created: 12-01-2024 22:01 UTC (18:01 ADT)
System: Linux 5.4.0-150-generic
Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 11.0.19+7-post-Ubuntu-0ubuntu118.04.1
Encoding: UTF-8 (ANSI_X3.4-1968)
Process: 18550@acri-AS-1124US-TNRP [127.0.1.1]
CPUs: 128 - Mem: 1007.8 GB (709.6 GB) - Swap: 2 GB (2 GB)
May-08 15:18:54.958 [main] DEBUG nextflow.Session - Work-dir: /home/acheema/work [ext2/ext3]
May-08 15:18:55.011 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
May-08 15:18:55.023 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
May-08 15:18:55.057 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
May-08 15:18:55.066 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 129; maxThreads: 1000
May-08 15:18:55.114 [main] DEBUG nextflow.Session - Session start
May-08 15:18:55.644 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
May-08 15:18:55.705 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
May-08 15:18:55.705 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
May-08 15:18:55.710 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
May-08 15:18:55.714 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=128; memory=1007.8 GB; capacity=128; pollInterval=100ms; dumpInterval=5m
May-08 15:18:55.716 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: local)
May-08 15:18:55.821 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
May-08 15:18:55.821 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
May-08 15:18:55.828 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: templateExample, read_count_p
May-08 15:18:55.828 [main] DEBUG nextflow.Session - Igniting dataflow network (2)
May-08 15:18:55.829 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > templateExample
May-08 15:18:55.830 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > read_count_p
May-08 15:18:55.831 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
Script_1e152ad49ae18340: /home/acheema/single_cell.nf
May-08 15:18:55.831 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
May-08 15:18:55.831 [main] DEBUG nextflow.Session - Session await
May-08 15:18:55.991 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-08 15:18:55.995 [Task submitter] INFO nextflow.Session - [8d/2e0586] Submitted process > templateExample
May-08 15:19:07.473 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: templateExample; status: COMPLETED; exit: 0; error: -; workDir: /home/acheema/work/8d/2e0586013131bee894e6322a38edf7]
May-08 15:19:07.504 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'PublishDir' minSize=10; maxSize=384; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-08 15:19:07.537 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-08 15:19:07.538 [Task submitter] INFO nextflow.Session - [6d/17dc6a] Submitted process > read_count_p
May-08 15:19:07.610 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: read_count_p; status: COMPLETED; exit: 2; error: -; workDir: /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428]
May-08 15:19:07.618 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=read_count_p; work-dir=/home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428
error [nextflow.exception.ProcessFailedException]: Process `read_count_p` terminated with an error exit status (2)
May-08 15:19:07.632 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'read_count_p'
Caused by:
Process `read_count_p` terminated with an error exit status (2)
Command executed:
Rscript read_count.R GSM3832735_wt_naive_gex.csv GSM3832737_wt_tumor_gex.csv
Command exit status:
2
Command output:
Fatal error: cannot open file 'read_count.R': No such file or directory
Command error:
Fatal error: cannot open file 'read_count.R': No such file or directory
Work dir:
/home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
May-08 15:19:07.635 [main] DEBUG nextflow.Session - Session await > all processes finished
May-08 15:19:07.638 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `read_count_p` terminated with an error exit status (2)
May-08 15:19:07.654 [main] DEBUG nextflow.Session - Session await > all barriers passed
May-08 15:19:07.655 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop
May-08 15:19:07.667 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=1; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=11.4s; failedDuration=41ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ]
May-08 15:19:07.856 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
May-08 15:19:07.879 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
But when I give the absolute path to the R script then it works fine.
script:
"""
Rscript /home/acheema/bin/read_count.R ${count_files}
"""
Now it works fine as given below,
acheema@acri-AS-1124US-TNRP:~$ nextflow run single_cell.nf
N E X T F L O W ~ version 23.10.1
Launching `single_cell.nf` [astonishing_lorenz] DSL2 - revision: f279637f1a
executor > local (2)
[26/99ed30] process > templateExample [100%] 1 of 1 ✔
[35/db989a] process > read_count_p [100%] 1 of 1 ✔
Is there a way that R script can be found and read from the bin folder? I have tried the solutions suggested here but it did not work. Is there a solution?
2
Answers
This might have something to do with file permissions, but it’s hard to say since the first process works.
What I do is read the R script into a value channel, and read it in like any other script. The benefit is also you can add a check if file exists function that will throw an error before the pipeline starts, rather than half way through if the R script is missing.
Also, I would just paste the
download_files.sh
into the script box of the process. It’s what nextflow was designed for. Same with the R script, but it would be more annoying to change, so I’ll leave it.And add the following channel to your channel creation script block
EDIT: Didn’t update the workflow declaration with the new Rscript channel.
Your Rscript needs to be in
bin
of the pipeline directory (which in your case seems to be~
) and should be executable (i.e.chmod +x
). Are you certain that your R binary is in /user/bin (not /usr)?You could also also change the shebang to
#!/usr/bin/env Rscript
and do