skip to Main Content

I want to enable data capture for a specific endpoint (so far, only via the console). The endpoint works fine and also logs & returns the desired results. However, no files are written to the specified S3 location.

Endpoint Configuration

The endpoint is based on a training job with a scikit learn classifier. It has only one variant which is a ml.m4.xlarge instance type. Data Capture is enabled with a sampling percentage of 100%. As data capture storage locations I tried s3://<bucket-name> as well as s3://<bucket-name>/<some-other-path>. With the "Capture content type" I tried leaving everything blank, setting text/csv in "CSV/Text" and application/json in "JSON".

Endpoint Invokation

The endpoint is invoked in a Lambda function with a client. Here’s the call:

sagemaker_body_source = {
            "segments": segments,
            "language": language
        }
payload = json.dumps(sagemaker_body_source).encode()
response = self.client.invoke_endpoint(EndpointName=endpoint_name,
                                       Body=payload,
                                       ContentType='application/json',
                                       Accept='application/json')
result = json.loads(response['Body'].read().decode())
return result["predictions"]

Internally, the endpoint uses a Flask API with an /invocation path that returns the result.

Logs

The endpoint itself works fine and the Flask API is logging input and output:

INFO:api:body: {'segments': [<strings...>], 'language': 'de'}
INFO:api:output: {'predictions': [{'text': 'some text', 'label': 'some_label'}, ....]}

2

Answers


  1. Chosen as BEST ANSWER

    So the issue seemed to be related to the IAM role. The default role (ModelEndpoint-Role) does not have access to write S3 files. It worked via the SDK since it uses another role in the sagemaker studio. I did not receive any error message about this.


  2. Data capture can be enabled by using the SDK as shown below –

    data_capture_config = DataCaptureConfig(
        enable_capture=True, sampling_percentage=100, destination_s3_uri=s3_capture_upload_path
    )
    
    predictor = model.deploy(
        initial_instance_count=1,
        instance_type="ml.m4.xlarge",
        endpoint_name=endpoint_name,
        data_capture_config=data_capture_config,
    )
    

    Make sure to reference your data capture config in your endpoint creation step. I’ve always seen this method to work. Can you try this and let me know? Reference notebook

    NOTE – I work for AWS SageMaker , but my opinions are my own.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search