skip to Main Content

I am using MLflow to track my experiments. I am using an S3 bucket as an artifact store. For acessing it, I want to use proxied artifact access, as described in the docs, however this does not work for me, since it locally looks for credentials (but the server should handle this).

Expected Behaviour

As described in the docs, I would expect that locally, I do not need to specify my AWS credentials, since the server handles this for me. From docs:

This eliminates the need to allow end users to have direct path access to a remote object store (e.g., s3, adls, gcs, hdfs) for artifact handling and eliminates the need for an end-user to provide access credentials to interact with an underlying object store.

Actual Behaviour / Error

Whenever I run an experiment on my machine, I am running into the following error:

botocore.exceptions.NoCredentialsError: Unable to locate credentials

So the error is local. However, this should not happen since the server should handle the auth instead of me needing to store my credentials locally. Also, I would expect that I would not even need library boto3 locally.

Solutions Tried

I am aware that I need to create a new experiment, because existing experiments might still use a different artifact location which is proposed in this SO answer as well as in the note in the docs. Creating a new experiment did not solve the error for me. Whenever I run the experiment, I get an explicit log in the console validating this:

INFO mlflow.tracking.fluent: Experiment with name 'test' does not exist. Creating a new experiment.

Related Questions (#1 and #2) refer to a different scenario, which is also described in the docs

Server Config

The server runs on a kubernetes pod with the following config:

mlflow server 
    --host 0.0.0.0 
    --port 5000 
    --backend-store-uri postgresql://user:pw@endpoint 
    --artifacts-destination s3://my_bucket/artifacts 
    --serve-artifacts 
    --default-artifact-root s3://my_bucket/artifacts 

I would expect my config to be correct, looking at doc page 1 and page 2

I am able to see the mlflow UI if I forward the port to my local machine. I also see the experiment runs as failed, because of the error I sent above.

My Code

The relevant part of my code which fails is the logging of the model:

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("test2)

...

# this works
mlflow.log_params(hyperparameters)
                        
model = self._train(model_name, hyperparameters, X_train, y_train)
y_pred = model.predict(X_test)
self._evaluate(y_test, y_pred)

# this fails with the error from above
mlflow.sklearn.log_model(model, "artifacts")

Question

I am probably overlooking something. Is there a need to locally indicate that I want to use proxied artified access? If yes, how do I do this? Is there something I have missed?

Full Traceback

  File /dir/venv/lib/python3.9/site-packages/mlflow/models/model.py", line 295, in log
    mlflow.tracking.fluent.log_artifacts(local_path, artifact_path)
  File /dir/venv/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 726, in log_artifacts
    MlflowClient().log_artifacts(run_id, local_dir, artifact_path)
  File /dir/venv/lib/python3.9/site-packages/mlflow/tracking/client.py", line 1001, in log_artifacts
    self._tracking_client.log_artifacts(run_id, local_dir, artifact_path)
  File /dir/venv/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 346, in log_artifacts
    self._get_artifact_repo(run_id).log_artifacts(local_dir, artifact_path)
  File /dir/venv/lib/python3.9/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 141, in log_artifacts
    self._upload_file(
  File /dir/venv/lib/python3.9/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 117, in _upload_file
    s3_client.upload_file(Filename=local_file, Bucket=bucket, Key=key, ExtraArgs=extra_args)
  File /dir/venv/lib/python3.9/site-packages/boto3/s3/inject.py", line 143, in upload_file
    return transfer.upload_file(
  File /dir/venv/lib/python3.9/site-packages/boto3/s3/transfer.py", line 288, in upload_file
    future.result()
  File /dir/venv/lib/python3.9/site-packages/s3transfer/futures.py", line 103, in result
    return self._coordinator.result()
  File /dir/venv/lib/python3.9/site-packages/s3transfer/futures.py", line 266, in result
    raise self._exception
  File /dir/venv/lib/python3.9/site-packages/s3transfer/tasks.py", line 139, in __call__
    return self._execute_main(kwargs)
  File /dir/venv/lib/python3.9/site-packages/s3transfer/tasks.py", line 162, in _execute_main
    return_value = self._main(**kwargs)
  File /dir/venv/lib/python3.9/site-packages/s3transfer/upload.py", line 758, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File /dir/venv/lib/python3.9/site-packages/botocore/client.py", line 508, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File /dir/venv/lib/python3.9/site-packages/botocore/client.py", line 898, in _make_api_call
    http, parsed_response = self._make_request(
  File /dir/venv/lib/python3.9/site-packages/botocore/client.py", line 921, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File /dir/venv/lib/python3.9/site-packages/botocore/endpoint.py", line 119, in make_request
    return self._send_request(request_dict, operation_model)
  File /dir/venv/lib/python3.9/site-packages/botocore/endpoint.py", line 198, in _send_request
    request = self.create_request(request_dict, operation_model)
  File /dir/venv/lib/python3.9/site-packages/botocore/endpoint.py", line 134, in create_request
    self._event_emitter.emit(
  File /dir/venv/lib/python3.9/site-packages/botocore/hooks.py", line 412, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File /dir/venv/lib/python3.9/site-packages/botocore/hooks.py", line 256, in emit
    return self._emit(event_name, kwargs)
  File /dir/venv/lib/python3.9/site-packages/botocore/hooks.py", line 239, in _emit
    response = handler(**kwargs)
  File /dir/venv/lib/python3.9/site-packages/botocore/signers.py", line 103, in handler
    return self.sign(operation_name, request)
  File /dir/venv/lib/python3.9/site-packages/botocore/signers.py", line 187, in sign
    auth.add_auth(request)
  File /dir/venv/lib/python3.9/site-packages/botocore/auth.py", line 407, in add_auth
    raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials

3

Answers


  1. Chosen as BEST ANSWER

    The problem is that the server is running on wrong run parameters, the --default-artifact-root needs to either be removed or set to mlflow-artifacts:/.

    From mlflow server --help:

      --default-artifact-root URI  Directory in which to store artifacts for any
                                   new experiments created. For tracking server
                                   backends that rely on SQL, this option is
                                   required in order to store artifacts. Note that
                                   this flag does not impact already-created
                                   experiments with any previous configuration of
                                   an MLflow server instance. By default, data
                                   will be logged to the mlflow-artifacts:/ uri
                                   proxy if the --serve-artifacts option is
                                   enabled. Otherwise, the default location will
                                   be ./mlruns.
    

  2. Having the same problem and the accepted answer doesn’t seem to solve my issue.
    Neither removing or setting mlflow-artifacts instead of s3 works for me. Moreover it gave me an error that since I have a remote backend-store-uri I need to set default-artifact-root while running the mlflow server.

    How I solved it that I find the error self explanatory, and the reason it states that it was unable to find credential is that mlflow underneath uses boto3 to do all the transaction. Since I had setup my environment variables in .env, just loading the file was enough for me and solved the issue. If you have the similar scenario then just run the following commands before starting your mlflow server,

    set -a
    source .env
    set +a
    

    This will load the environment variables and you will be good to go.

    Note:

    • I was using both remote server for backend and artifacts storage, mainly postgres and minio.
    • For remote backend backend-store-uri is must otherwise you will not be able to startup your mlflow server
    Login or Signup to reply.
  3. The answer @bk_ helped me. I ended up with the following command to get my Tracking Server running with proxied connection for artifact storage:

    mlflow server 
    --backend-store-uri postgresql://postgres:postgres@postgres:5432/mlflow 
    --default-artifact-root mlflow-artifacts:/ 
    --serve-artifacts 
    --host 0.0.0.0
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search