skip to Main Content

I have a working azure container registry. I am able to do git pull, push, etc. using service principal authentication.

However, when I send it to a compute cluster, there is no way that the cluster could pull the correct docker images from the container registry.

Here is output from the cluster run’s output:

2023/09/25 14:39:53 Downloading source code...
2023/09/25 14:39:54 Finished downloading source code
2023/09/25 14:39:54 Creating Docker network: acb_default_network, driver: 'bridge'
2023/09/25 14:39:55 Successfully set up Docker network: acb_default_network
2023/09/25 14:39:55 Setting up Docker configuration...
2023/09/25 14:39:55 Successfully set up Docker configuration
2023/09/25 14:39:55 Logging in to registry: caf220d1b3fa459e8a75611cec79dbe5.azurecr.io
2023/09/25 14:39:56 Successfully logged into caf220d1b3fa459e8a75611cec79dbe5.azurecr.io
2023/09/25 14:39:56 Volume source scriptsFromEms successfully created
2023/09/25 14:39:56 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Network: 'acb_default_network'
2023/09/25 14:39:56 Scanning for dependencies...
2023/09/25 14:39:57 Successfully scanned dependencies
2023/09/25 14:39:57 Launching container with name: acb_step_0
Sending build context to Docker daemon  77.31kB

Step 1/21 : FROM tampere.azurecr.io/custom/ubuntu20.04@sha256:aa19088a50e382bbde8b89e9a5848ef3632736a35dd45d991cae9afa76ed5cba
Get "https://tampere.azurecr.io/v2/custom/ubuntu20.04/manifests/sha256:aa19088a50e382bbde8b89e9a5848ef3632736a35dd45d991cae9afa76ed5cba": unauthorized: authentication required, visit https://aka.ms/acr/authorization for more information.
2023/09/25 14:39:58 Container failed during run: acb_step_0. No retries remaining.
failed to run step ID: acb_step_0: exit status 1

Run ID: cbt failed after 15s. Error: failed during run, err: exit status 1

I am using azure ml API v2, like this:

from azure.ai.ml import command
...
job = command(
    code="./",
    command=cli_command,
    environment=f"mlflow-env:4",  # *must* provide a version
    compute=cluster_name,
    experiment_name=experiment_name,
    display_name=job_name,
    environment_variables={
        "PYTHONUNBUFFERED":"1"
    }
)
print("LAUNCHING JOB")
ml_client.create_or_update(job)
print("JOB LAUNCHED! - HAVE A NICE DAY")

But haven’t found any way to add the docker username and password (from the service principal) to the input of the builder function "command".

It seems that in the old v1 API this can be done: ref

Is the conclusion then that API v2 is broken or am I missing something here?

2

Answers


  1. It seems Azure ML SDK v2 does not currently support passing in the Docker credentials to the command() builder function.

    However, as per documentation one possible workaround can be by
    Using a managed identity .

    If you are using a managed identity to authenticate with Azure ML, you can use the managed identity to pull Docker images from Azure Container Registry and use the environment in your command job.

    For details on setup you can check :
    Build Azure Machine Learning managed environment into base image from private ACR for training or inference.

    Login or Signup to reply.
  2. Image build that materializes your AzureML specification happens on an agent ACR provides to run a task. You can authorize AzureML to pass credentials to ACR Task -> docker for private registries. The best way to do so is AzureML Workspace Connections. You specify registry (target), connection type (ACR/ContainerRegistry), AuthType (Basic would work in all the cases, but I’d recommend to generate pull only scoped token), and the actual secret. Once it is set all image builds in the workspace will be authorized to pull base images/artifacts from the registry.

    Here are some samples for v2 cli/sdk:

    https://github.com/Azure/azureml-examples/blob/main/sdk/python/resources/connections/connections.ipynb

    https://learn.microsoft.com/en-us/cli/azure/ml/connection?view=azure-cli-latest

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search