Sagemaker Data Capture does not write files - Amazon web services

Richard
August 9, 2022
332 views
0 votes
2 Answers

I want to enable data capture for a specific endpoint (so far, only via the console). The endpoint works fine and also logs & returns the desired results. However, no files are written to the specified S3 location.

Endpoint Configuration

The endpoint is based on a training job with a scikit learn classifier. It has only one variant which is a ml.m4.xlarge instance type. Data Capture is enabled with a sampling percentage of 100%. As data capture storage locations I tried s3://<bucket-name> as well as s3://<bucket-name>/<some-other-path>. With the "Capture content type" I tried leaving everything blank, setting text/csv in "CSV/Text" and application/json in "JSON".

Endpoint Invokation

The endpoint is invoked in a Lambda function with a client. Here’s the call:

sagemaker_body_source = {
            "segments": segments,
            "language": language
        }
payload = json.dumps(sagemaker_body_source).encode()
response = self.client.invoke_endpoint(EndpointName=endpoint_name,
                                       Body=payload,
                                       ContentType='application/json',
                                       Accept='application/json')
result = json.loads(response['Body'].read().decode())
return result["predictions"]

Internally, the endpoint uses a Flask API with an /invocation path that returns the result.

Logs

The endpoint itself works fine and the Flask API is logging input and output:

INFO:api:body: {'segments': [<strings...>], 'language': 'de'}

INFO:api:output: {'predictions': [{'text': 'some text', 'label': 'some_label'}, ....]}

Tags: amazon-sagemaker amazon-web-services

Answers

Chosen as BEST ANSWER
- Richard
- August 16, 2022 at 2:46 pm
- 0 votes
0
So the issue seemed to be related to the IAM role. The default role (ModelEndpoint-Role) does not have access to write S3 files. It worked via the SDK since it uses another role in the sagemaker studio. I did not receive any error message about this.

(Edit)

- RaghuRamesha
- August 12, 2022 at 6:43 pm
- 0 votes
0
Data capture can be enabled by using the SDK as shown below –
```
data_capture_config = DataCaptureConfig(
    enable_capture=True, sampling_percentage=100, destination_s3_uri=s3_capture_upload_path
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.m4.xlarge",
    endpoint_name=endpoint_name,
    data_capture_config=data_capture_config,
)
```
Make sure to reference your data capture config in your endpoint creation step. I’ve always seen this method to work. Can you try this and let me know? Reference notebook

NOTE – I work for AWS SageMaker , but my opinions are my own.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Sagemaker Data Capture does not write files – Amazon web services

Endpoint Configuration

Endpoint Invokation

Logs

Answers