I want to enable data capture for a specific endpoint (so far, only via the console). The endpoint works fine and also logs & returns the desired results. However, no files are written to the specified S3 location.
Endpoint Configuration
The endpoint is based on a training job with a scikit learn classifier. It has only one variant which is a ml.m4.xlarge
instance type. Data Capture is enabled with a sampling percentage of 100%. As data capture storage locations I tried s3://<bucket-name>
as well as s3://<bucket-name>/<some-other-path>
. With the "Capture content type" I tried leaving everything blank, setting text/csv
in "CSV/Text" and application/json
in "JSON".
Endpoint Invokation
The endpoint is invoked in a Lambda function with a client. Here’s the call:
sagemaker_body_source = {
"segments": segments,
"language": language
}
payload = json.dumps(sagemaker_body_source).encode()
response = self.client.invoke_endpoint(EndpointName=endpoint_name,
Body=payload,
ContentType='application/json',
Accept='application/json')
result = json.loads(response['Body'].read().decode())
return result["predictions"]
Internally, the endpoint uses a Flask API with an /invocation
path that returns the result.
Logs
The endpoint itself works fine and the Flask API is logging input and output:
INFO:api:body: {'segments': [<strings...>], 'language': 'de'}
INFO:api:output: {'predictions': [{'text': 'some text', 'label': 'some_label'}, ....]}
2
Answers
So the issue seemed to be related to the IAM role. The default role (
ModelEndpoint-Role
) does not have access to write S3 files. It worked via the SDK since it uses another role in the sagemaker studio. I did not receive any error message about this.Data capture can be enabled by using the SDK as shown below –
Make sure to reference your data capture config in your endpoint creation step. I’ve always seen this method to work. Can you try this and let me know? Reference notebook
NOTE – I work for AWS SageMaker , but my opinions are my own.