I’m trying to request a VM with 128Gb of memory through nextflow, but it keeps failing. I can request VMs with 64Gb, but I get the following errors when I try to request more:
Nextflow stdout:
ERROR ~ Error executing process > 'DBDownload:INDEXDB'
Caused by:
Process `DBDownload:INDEXDB` terminated with an error exit status (null)
...
Command exit status:
null
Command output:
(empty)
.nextflow.log (I’ve removed GCP project information)
Sep-05 16:23:00.919 [main] DEBUG nextflow.cli.Launcher - $> nextflow run af2mm.nf -c af2mm_gcp.config -with-wave --Mode dbdownload --skip_download --skip_exprofile
Sep-05 16:23:01.116 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 23.05.0-edge
Sep-05 16:23:01.128 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/dthorbur/.nextflow/plugins; core-plugins: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]
Sep-05 16:23:01.165 [main] INFO o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Sep-05 16:23:01.175 [main] INFO o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Sep-05 16:23:01.178 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Sep-05 16:23:01.220 [main] INFO org.pf4j.AbstractPluginManager - No plugins
Sep-05 16:23:01.240 [main] DEBUG nextflow.config.ConfigBuilder - User config file: /mnt/c/Users/miles/Documents/Resurrect_Bio/Scripts/aws_nf/af2mm_gcp.config
Sep-05 16:23:01.241 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /mnt/c/Users/miles/Documents/Resurrect_Bio/Scripts/aws_nf/af2mm_gcp.config
Sep-05 16:23:01.256 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Sep-05 16:23:01.764 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 by global default
Sep-05 16:23:01.780 [main] INFO nextflow.cli.CmdRun - Launching `af2mm.nf` [adoring_williams] DSL2 - revision: fcb07b2d08
Sep-05 16:23:01.781 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[[email protected]]
Sep-05 16:23:01.783 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[[email protected], [email protected]]
Sep-05 16:23:01.784 [main] DEBUG nextflow.plugin.PluginUpdater - Installing plugin nf-google version: 1.7.4
Sep-05 16:23:03.149 [main] INFO org.pf4j.AbstractPluginManager - Plugin '[email protected]' resolved
Sep-05 16:23:03.150 [main] INFO org.pf4j.AbstractPluginManager - Start plugin '[email protected]'
Sep-05 16:23:03.174 [main] DEBUG nextflow.plugin.BasePlugin - Plugin started [email protected]
Sep-05 16:23:03.175 [main] DEBUG nextflow.plugin.PluginUpdater - Installing plugin nf-wave version: 0.9.0
Sep-05 16:23:03.903 [main] INFO org.pf4j.AbstractPluginManager - Plugin '[email protected]' resolved
Sep-05 16:23:03.903 [main] INFO org.pf4j.AbstractPluginManager - Start plugin '[email protected]'
Sep-05 16:23:03.909 [main] DEBUG nextflow.plugin.BasePlugin - Plugin started [email protected]
Sep-05 16:23:03.917 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/dthorbur/.nextflow/secrets/store.json
Sep-05 16:23:03.919 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@4c5228e7] - activable => nextflow.secret.LocalSecretsProvider@4c5228e7
Sep-05 16:23:03.961 [main] DEBUG nextflow.Session - Session UUID: d7c4b778-39a6-4a9b-9c7e-9c73ced0c9bd
Sep-05 16:23:03.961 [main] DEBUG nextflow.Session - Run name: adoring_williams
Sep-05 16:23:03.962 [main] DEBUG nextflow.Session - Executor pool size: 20
Sep-05 16:23:04.044 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
Sep-05 16:23:04.047 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=60; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Sep-05 16:23:04.061 [main] DEBUG nextflow.cli.CmdRun -
Version: 23.05.0-edge build 5861
Created: 15-05-2023 04:13 UTC (05:13 BST)
System: Linux 5.15.90.1-microsoft-standard-WSL2
Runtime: Groovy 3.0.17 on OpenJDK 64-Bit Server VM 11.0.20+8-post-Ubuntu-1ubuntu122.04
Encoding: UTF-8 (UTF-8)
Process: 170716@DESKTOP-JS6IT6O [127.0.1.1]
CPUs: 20 - Mem: 7.6 GB (3 GB) - Swap: 2 GB (1.8 GB)
Sep-05 16:23:04.101 [main] DEBUG nextflow.file.FileHelper - Can't check if specified path is NFS (1): gs://colabfoldlocal-db/workDir
Sep-05 16:23:04.103 [main] DEBUG nextflow.Session - Work-dir: gs://colabfoldlocal-db/workDir [null]
Sep-05 16:23:04.105 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /mnt/c/Users/miles/Documents/Resurrect_Bio/Scripts/aws_nf/bin
Sep-05 16:23:04.119 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[GoogleLifeSciencesExecutor, GoogleBatchExecutor]
Sep-05 16:23:04.128 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Sep-05 16:23:04.134 [main] DEBUG nextflow.Session - Observer factory: WaveFactory
Sep-05 16:23:04.135 [main] DEBUG io.seqera.wave.plugin.WaveFactory - Detected Fusion enabled -- Enabling bundle project resources -- Disabling upload of remote bin directory
Sep-05 16:23:04.166 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Sep-05 16:23:04.174 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 21; maxThreads: 1000
Sep-05 16:23:04.411 [main] DEBUG nextflow.Session - Session start
Sep-05 16:23:04.616 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Sep-05 16:23:05.114 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: google-batch
Sep-05 16:23:05.115 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'google-batch'
Sep-05 16:23:05.118 [main] DEBUG nextflow.executor.Executor - [warm up] executor > google-batch (fusion enabled)
Sep-05 16:23:05.123 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'google-batch' > capacity: 1000; pollInterval: 10s; dumpInterval: 5m
Sep-05 16:23:05.131 [main] DEBUG nextflow.cloud.google.GoogleOpts - Google auth via application credentials file: /home/dthorbur/rb-main.json
Sep-05 16:23:05.135 [main] DEBUG n.c.google.batch.GoogleBatchExecutor - [GOOGLE BATCH] Executor config=BatchConfig[googleOpts=GoogleOpts(<details_removed>, location:us-central1, enableRequesterPaysBuckets:false, credentials:ServiceAccountCredentials{<details_removed>})
Sep-05 16:23:05.146 [main] DEBUG n.c.google.batch.client.BatchClient - [GOOGLE BATCH] Creating service client with config credentials
Sep-05 16:23:05.442 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: DOWNLOAD, EXPROFILEDB, DBDownload:INDEXDB, INDEXDB
Sep-05 16:23:05.444 [main] DEBUG nextflow.Session - Igniting dataflow network (4)
Sep-05 16:23:05.455 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > DBDownload:INDEXDB
Sep-05 16:23:05.458 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
Sep-05 16:23:05.460 [main] DEBUG nextflow.Session - Session await
Sep-05 16:23:05.463 [PathVisitor-3] DEBUG nextflow.file.PathVisitor - files for syntax: glob; folder: /ready/august-2023/; pattern: *_READY; options: [:]
Sep-05 16:23:05.463 [PathVisitor-2] DEBUG nextflow.file.PathVisitor - files for syntax: glob; folder: /raw/august-2023/; pattern: *.gz; options: [:]
Sep-05 16:23:22.227 [Actor Thread 21] DEBUG i.s.wave.plugin.config.WaveConfig - Wave strategy not specified - using default: [container, dockerfile, conda, spack]
Sep-05 16:23:22.232 [Actor Thread 21] DEBUG io.seqera.wave.plugin.WaveClient - Wave server endpoint: https://wave.seqera.io
Sep-05 16:23:23.337 [Actor Thread 21] DEBUG io.seqera.wave.plugin.WaveClient - Wave request container config: https://fusionfs.seqera.io/releases/v2.2-amd64.json
Sep-05 16:23:23.426 [Actor Thread 21] DEBUG io.seqera.wave.plugin.WaveClient - Wave container config response: [200] {
"entrypoint": [ "/usr/bin/fusion" ],
"layers": [
{
"location": "https://fusionfs.seqera.io/releases/pkg/2/2/7/fusion-amd64.tar.gz",
"gzipDigest": "sha256:d0fcc536dd85f32c4387ee8d649404cbf9572e98d408127fb4832aa34ac55f37",
"gzipSize": 30216048,
"tarDigest": "sha256:78a9c8ecdf42aa038e5aa9a41a3863a51eaece82383444cf94ef3eefad927a29",
"skipHashing": true
}
]
}
Sep-05 16:23:23.450 [Actor Thread 21] DEBUG io.seqera.wave.plugin.WaveClient - Wave request: https://wave.seqera.io/container-token; attempt=1 - request: SubmitContainerTokenRequest(towerAccessToken:null, towerRefreshToken:null, towerWorkspaceId:null, towerEndpoint:https://api.tower.nf, workflowId:null, containerImage:dthorbur1990/colabfold_dbdownload:latest, containerFile:null, containerConfig:ContainerConfig(entrypoint:[/usr/bin/fusion], cmd:null, env:null, workingDir:null, layers:[ContainerLayer[location=https://fusionfs.seqera.io/releases/pkg/2/2/7/fusion-amd64.tar.gz; tarDigest=sha256:78a9c8ecdf42aa038e5aa9a41a3863a51eaece82383444cf94ef3eefad927a29; gzipDigest=sha256:d0fcc536dd85f32c4387ee8d649404cbf9572e98d408127fb4832aa34ac55f37; gzipSize=30216048]]), condaFile:null, spackFile:null, containerPlatform:linux/amd64, buildRepository:null, cacheRepository:null, timestamp:2023-09-05T16:23:23.448422+01:00, fingerprint:168d8c59e937ce651e2ccb0b92e6f26a)
Sep-05 16:23:23.495 [Actor Thread 21] DEBUG io.seqera.wave.plugin.WaveClient - Wave response: statusCode=200; body={"containerToken":"db7b3f9e60ab","targetImage":"wave.seqera.io/wt/db7b3f9e60ab/dthorbur1990/colabfold_dbdownload:latest","expiration":"2023-09-07T03:23:23.118116590Z","containerImage":"docker.io/dthorbur1990/colabfold_dbdownload:latest"}
Sep-05 16:23:27.870 [Task submitter] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] submitted > job=nf-3a9439ca-1693927405535; uid=nf-3a9439ca-169392-fcee835e-0a0d-49970; work-dir=gs://colabfoldlocal-db/workDir/3a/9439caca583a5b2ec3a1cbdfa68bd1
Sep-05 16:23:27.871 [Task submitter] INFO nextflow.Session - [3a/9439ca] Submitted process > DBDownload:INDEXDB
Sep-05 16:24:35.718 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Terminated job=nf-3a9439ca-1693927405535; state=FAILED
Sep-05 16:24:35.896 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Cannot read exitstatus for task: `DBDownload:INDEXDB` | gs://colabfoldlocal-db/workDir/3a/9439caca583a5b2ec3a1cbdfa68bd1/.exitcode
Sep-05 16:24:37.732 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: DBDownload:INDEXDB; status: COMPLETED; exit: null; error: -; workDir: gs://colabfoldlocal-db/workDir/3a/9439caca583a5b2ec3a1cbdfa68bd1]
Sep-05 16:24:37.739 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=DBDownload:INDEXDB; work-dir=gs://colabfoldlocal-db/workDir/3a/9439caca583a5b2ec3a1cbdfa68bd1
error [nextflow.exception.ProcessFailedException]: Process `DBDownload:INDEXDB` terminated with an error exit status (null)
Sep-05 16:24:37.866 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'null' -- Cause: java.nio.file.NoSuchFileException: gs://colabfoldlocal-db/workDir/3a/9439caca583a5b2ec3a1cbdfa68bd1/.command.out
Sep-05 16:24:37.994 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'null' -- Cause: java.nio.file.NoSuchFileException: gs://colabfoldlocal-db/workDir/3a/9439caca583a5b2ec3a1cbdfa68bd1/.command.err
Sep-05 16:24:38.000 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'DBDownload:INDEXDB'
GCP Batch Job Details Events:
STATUS_CHANGED 2023-09-05T15:24:21.023034249Z
Job state is set from SCHEDULED to SCHEDULED_PENDING_FAILED for job projects/382883280368/locations/us-central1/jobs/nf-3a9439ca-1693927405535.
OPERATIONAL_INFO 2023-09-05T15:24:21.004335645Z
Job gets non-retryable information Batch Error: code - CODE_GCE_INVALID_FIELD_VALUE, description - Invalid value for field 'operation': ''. No zone supports all of the provided instance templates. The following errors detail the failure..
STATUS_CHANGED 2023-09-05T15:23:37.429911387Z
Job state is set from QUEUED to SCHEDULED for job projects/382883280368/locations/us-central1/jobs/nf-3a9439ca-1693927405535.
What I’m confused about is why i cannot request the high memory VMs. Either through the cpus
, memory
, or machineType
process parameters. Again, the process runs fine with 64Gb requested through the declaration memory '64 GB'
, but I get the error Can not touch 72675005914 into main memory
from MMseqs
.
I’m trying other regions, but I’m using us-central1
so it should have availability there. I will also try to remove the spot = true
config option if only preemptible VMs can request so much memory. But I don’t have high hopes.
2
Answers
I'm not entirely sure I understand why this solution works, but in case anyone else has this issue I found a working solution.
My guess is the SDA disk, NVMe disk, and fusion filesystem count as 3 mounted disks which is not allowed for N2 machine types with between 2-10 CPUs. I will try to mount another and see if it works, but in the meantime the N1 machine type allows 3 mounted drives and ran successfully with 128Gb of memory.
As simple as adding
machineType "n1-*"
in the process.Based on the error, you may consider checking the Quotas if you have sufficient resources for CPU and Memory. Here is a guide regarding quotas and how you can address them.[1]
You may also want to check the Nextflow configuration if it has the correct setting for VM instance memory.
[1] https://cloud.google.com/docs/quota_detail/view_manage#managing_your_quota_console