I’ve installed ClearML test manager solution using ClearML Docker-Compose. So now the whole thing is running using 6 containers (webserver,apiserver,redis,elasticsearch,fileserver and mongodb). I’m running the default Cleanup Service – However the task is in pending state because there are no Workers configured for this queue. How do I configure a Worker for the default queue when ClearML is configured to run using Docker ?
Tried to run in locally . not using Docker .
2
Answers
Disclaimer: I’m a member of the ClearML team (formerly Trains)
I assume the cleanup service uses the
services
queue. The server deployment already contains an agent (services-agent) that should be listening to this queue, but it’s probably lacking the credentials to access the server (it’s functioning as a normal client, so it needs credentials).The docker-compose.yml for ClearML actually has a section that configures this, but it needs the environment variables
CLEARML_API_ACCESS_KEY
andCLEARML_API_SECRET_KEY
to be defined. To define these, first go to the ClearML UI to your profile section, generate a new set of credentials and use their values for the environment variables. Than, once the environment variables are defined, restart docker-compose (using thedocker-compose down
anddocker-compose up
commands as shown in the installation and upgrade documentation).The services agent should appear in the workers and queues page in the ClearML UI once the server is back up.
Looks like you inserted a new task to the
default
queue (I assume this is the only queue currently created under your workspace).You can create a
services
queue (like Martin.B suggested), but this is not a most, you can just spin up a new clearml-agent listening to thedefault
queue and this clearml-agent will run your service (like in here).You should remember that when you spin up a clearml-agent you allocate the resources it will use – the best practice for a service task (not a training task) is to allocate only CPU and not GPU for this agent. Those tasks won’t use any GPU and most of the time will be in an idle state.
You can just run This example and all should work out of the box 🙂