My system has 4 CPU, 16 GB RAM. My Aim is to deploy dask distributed workers that use 1 CPU each ONLY to run code assigned to them.
I am deploying a scheduler container and worker containers using docker to run a code that uses Dask delayed and dask dataframes.
Following are the Docker run commands for both:
Scheduler
docker run --name dask-scheduler --network host daskdev/dask:2023.1.1-py3.10 dask-scheduler
worker
(multiple docker run commands tried, all different combinations of following command)
docker run --name dask-worker --network host dask-worker --nworkers 1 --nthreads 1 --resources "process=1" --memory-limit 2 GiB --name dask-worker daskdev/dask:2023.1.1-py3.10 10.76.8.50:8786
Creating client and establishig connection with scheduler.
client = Client("tcp://10.76.8.50:8786")
Now what I wanted was that when dask.compute(scheduler="processes") is run, the worker will use only 1 cpu for running the code. However, atleast 3 CPU can be seen at 100% capacity.
Is there something I have missed?
How could I limit a distributed worker to use ONLY 1 CPU for this compute work?
2
Answers
By specifying
--resources "process=1"
, this amount of resources is allocated per worker. However, to make sure that each task uses this resource, it’s necessary to specify the resource used by the task. For example, by annotating:The above snippet will make sure that all the wrapped computations make use of 1 unit of
process
resource per task.You are specifying scheduler="processes", which will completely ignore your distributed cluster. Instead, make a distributed client in the python session where you are running your code
client = dask.distributed.Client(...)
with the TCP address of the scheduler, and the compute will run on your workers, one thread each, automatically.