Ubuntu - How do I run many Singularity/Apptainer containers from one Python script, using multiple CPUs and nodes?

theobrown
July 5, 2023
294 views
0 votes
2 Answers

Problem statement

I have a Python program that needs to launch a number of Singularity containers in parallel.

Is it possible to do this, exploiting all of the available hardware, using only built-in libraries (subprocessing, concurrent.futures, etc)?

The ‘host’ script runs on 1 CPU. It is launched by SLURM. The ‘host’ needs to launch the containers, wait for them to complete, do some analysis, repeat.

For example, if I have 40 containers each needing 2 CPUs, and two nodes each with 76 CPUs, then there should be something like:

Node 1 (76 CPUs)	Node 2 (76 CPUs)
Host script (1 CPU)	3 containers (6 CPUs)
37 containers (74 CPUs)	70 spare CPUs
1 spare CPU

MWE

Singularity recipe (`stress.def`)

We use stress to fully utilise a given number of CPUs:

Bootstrap: docker
From: ubuntu:16.04

%post
apt update -y
apt install -y stress 

%runscript
    echo $(uname -n)
    stress "$@"

Build with singularity build stress.simg stress.def.

Python host script (`main.py`)

Spin up 40 containers, each running the stress image with 2 CPUs for 10s:

from subprocess import Popen

n_processes = 40
cpus_per_process = 2
stress_time = 10

command = [
    "singularity",
    "run",
    "stress.simg",
    "-c",
    str(cpus_per_process),
    "-t",
    f"{stress_time}s",
]
processes = [Popen(command) for i in range(n_processes)]

for p in processes:
    p.wait()

SLURM script

#!/bin/bash
#SBATCH -J stress
#SBATCH -A myacc
#SBATCH -p mypart
#SBATCH --output=%x_%j.out
#SBATCH --nodes=2
#SBATCH --ntasks=40
#SBATCH --cpus-per-task=2
#SBATCH --time=24:00:00

python main.py

Results

The above only runs on one of the two nodes. Total execution time is around 20s, and the Singularity containers are run sequentially – the first 38 are run, and then the last two.

As such, it does not have the desired effect.

Answers

Chosen as BEST ANSWER
- theobrown
- July 5, 2023 at 8:11 pm
- 0 votes
0
Turns out my question was just from a misunderstanding of what should be handled by Singularity and what should be handled by SLURM.

My mistake was thinking that Singularity could see and utilise other nodes; in reality, it can only see resources available on the current node.

Solution:
1. Allocate all the resources that the overall job will need in the SLURM script, treating each container as a new task. So in the above example, that means setting ntasks=40 and cpus-per-task=2.
2. Launch the subprocesses with srun. This will allow SLURM to allocate resources to the containers from the pool that have already been allotted to this particular run.
Modified main.py:
```
from subprocess import Popen

n_processes = 40
cpus_per_process = 2
stress_time = 10

command = [
    "srun", # <--------- MODIFICATION
    "singularity",
    "run",
    "stress.simg",
    "-c",
    str(cpus_per_process),
    "-t",
    f"{stress_time}s",
]
processes = [Popen(srun_command) for i in range(n_processes)]

for p in processes:
    p.wait()
```

(Edit)

Could you give this a try. Thanks.

import subprocess

# Define the number of instances and CPUs to use
num_instances = 15
num_cpus = 1

# Create a list to store the subprocess instances
processes = []

print("Starting Each Process")

# Run multiple instances of the Singularity container in parallel
for _ in range(num_instances):
    # Define the Singularity command
    singularity_cmd = [
        "singularity", "run", "--contain", "--cpu", str(num_cpus), "-t", str(STRESS_TIME), "stress.simg",
        # Add any additional arguments or commands here
    ]

    # Start the subprocess for each instance
    process = subprocess.Popen(singularity_cmd)
    processes.append(process)

print("Waiting for Process to end")

# Wait for all subprocesses to complete
for process in processes:
    process.wait()

print("All Processes Completed")

Please signup or login to give your own answer.

Click here to cancel reply.

Ubuntu – How do I run many Singularity/Apptainer containers from one Python script, using multiple CPUs and nodes?