skip to Main Content

My system has 4 CPU, 16 GB RAM. My Aim is to deploy dask distributed workers that use 1 CPU each ONLY to run code assigned to them.

I am deploying a scheduler container and worker containers using docker to run a code that uses Dask delayed and dask dataframes.

Following are the Docker run commands for both:

Scheduler

docker run --name dask-scheduler --network host daskdev/dask:2023.1.1-py3.10 dask-scheduler

worker
(multiple docker run commands tried, all different combinations of following command)

docker run --name dask-worker --network host dask-worker --nworkers 1 --nthreads 1 --resources "process=1" --memory-limit 2 GiB --name dask-worker daskdev/dask:2023.1.1-py3.10 10.76.8.50:8786

Creating client and establishig connection with scheduler.
client = Client("tcp://10.76.8.50:8786")

Now what I wanted was that when dask.compute(scheduler="processes") is run, the worker will use only 1 cpu for running the code. However, atleast 3 CPU can be seen at 100% capacity.

cpu usage

Is there something I have missed?
How could I limit a distributed worker to use ONLY 1 CPU for this compute work?

2

Answers


  1. By specifying --resources "process=1", this amount of resources is allocated per worker. However, to make sure that each task uses this resource, it’s necessary to specify the resource used by the task. For example, by annotating:

    with dask.annotate(resources=dict(process=1)):
        ...
    

    The above snippet will make sure that all the wrapped computations make use of 1 unit of process resource per task.

    Login or Signup to reply.
  2. You are specifying scheduler="processes", which will completely ignore your distributed cluster. Instead, make a distributed client in the python session where you are running your code client = dask.distributed.Client(...) with the TCP address of the scheduler, and the compute will run on your workers, one thread each, automatically.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search