We want to deploy a trained Tensorflow Model to AWS Sagemaker for inference with a tensorflow-serving-container. Tensorflow version is 2.1. Following the guide at https://github.com/aws/sagemaker-tensorflow-serving-container the following steps have been taken:
- Build TF 2.1 AMI and publish it to AWS ECR after sucessful local testing
- Setting Sagemaker Execution Role Permissions for S3 and ECR.
- Pack saved TF model folder (saved_model.pb, assets, variables) into model.tar.gz
- Created endpoint with realtime predictor:
import os
import sagemaker
from sagemaker.tensorflow.serving import Model
from sagemaker.tensorflow.model import TensorFlowModel
from sagemaker.predictor import json_deserializer, json_serializer, RealTimePredictor
from sagemaker.content_types import CONTENT_TYPE_JSON
def create_tfs_sagemaker_model():
sagemaker_session = sagemaker.Session()
role = 'arn:aws:iam::XXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXX
bucket = 'tf-serving'
prefix = 'sagemaker/tfs-test'
s3_path = 's3://{}/{}'.format(bucket, prefix)
image = 'XXXXXXXX.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-tensorflow-serving:2.1.0-cpu'
model_data = sagemaker_session.upload_data('model.tar.gz', bucket, os.path.join(prefix, 'model'))
endpoint_name = 'tf-serving-ep-test-1'
tensorflow_serving_model = Model(model_data=model_data, role=role, sagemaker_session=sagemaker_session, image=image, framework_version='2.1')
tensorflow_serving_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)
rt_predictor = RealTimePredictor(endpoint=endpoint_name, sagemaker_session=sagemaker_session, serializer=json_serializer, content_type=CONTENT_TYPE_JSON, accept=CONTENT_TYPE_JSON)
- Create batch-transform job:
def create_tfs_sagemaker_batch_transform():
sagemaker_session = sagemaker.Session()
print(sagemaker_session.boto_region_name)
role = 'arn:aws:iam::XXXXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXXX'
bucket = 'XXXXXXX-tf-serving'
prefix = 'sagemaker/tfs-test'
image = 'XXXXXXXXXX.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-tensorflow-serving:2.1.0-cpu'
s3_path = 's3://{}/{}'.format(bucket, prefix)
model_data = sagemaker_session.upload_data('model.tar.gz', bucket, os.path.join(prefix, 'model'))
tensorflow_serving_model = Model(model_data=model_data, role=role, sagemaker_session=sagemaker_session, image=image, name='deep-net-0', framework_version='2.1')
print(tensorflow_serving_model.model_data)
out_path = 's3://XXXXXX-serving-out/'
input_path = "s3://XXXXXX-serving-in/"
tensorflow_serving_transformer = tensorflow_serving_model.transformer(instance_count=1, instance_type='ml.c4.xlarge', accept='application/json', output_path=out_path)
tensorflow_serving_transformer.transform(input_path, content_type='application/json')
Both steps 4 and 5 are running and in the AWS Cloudwatch logs we see successful starting of the instances, loading of the model and TF-Serving entering the event loop – see below:
2020-07-08T17:07:16.156+02:00 INFO:main:starting services
2020-07-08T17:07:16.156+02:00 INFO:main:nginx config:
2020-07-08T17:07:16.156+02:00 load_module
modules/ngx_http_js_module.so;2020-07-08T17:07:16.156+02:00 worker_processes auto;
2020-07-08T17:07:16.156+02:00 daemon off;
2020-07-08T17:07:16.156+02:00 pid /tmp/nginx.pid;
2020-07-08T17:07:16.157+02:00 error_log /dev/stderr error;
2020-07-08T17:07:16.157+02:00 worker_rlimit_nofile 4096;
2020-07-08T17:07:16.157+02:00 events { worker_connections 2048;
2020-07-08T17:07:16.157+02:00 }
2020-07-08T17:07:16.162+02:00 http { include /etc/nginx/mime.types;
default_type application/json; access_log /dev/stdout combined;
js_include tensorflow-serving.js; upstream tfs_upstream { server
localhost:10001; } upstream gunicorn_upstream { server
unix:/tmp/gunicorn.sock fail_timeout=1; } server { listen 8080
deferred; client_max_body_size 0; client_body_buffer_size 100m;
subrequest_output_buffer_size 100m; set $tfs_version 2.1; set
$default_tfs_model None; location /tfs { rewrite ^/tfs/(.*) /$1 break;
proxy_redirect off; proxy_pass_request_headers off; proxy_set_header
Content-Type ‘application/json’; proxy_set_header Accept
‘application/json’; proxy_pass http://tfs_upstream; } location /ping {
js_content ping; } location /invocations { js_content invocations; }
location /models { proxy_pass http://gunicorn_upstream/models; }
location / { return 404 ‘{"error": "Not Found"}’; } keepalive_timeout
3; }2020-07-08T17:07:16.162+02:00 }
2020-07-08T17:07:16.162+02:00 INFO:tfs_utils:using default model name:
model2020-07-08T17:07:16.162+02:00 INFO:tfs_utils:tensorflow serving model
config:2020-07-08T17:07:16.162+02:00 model_config_list: { config: { name:
"model", base_path: "/opt/ml/model", model_platform: "tensorflow" }2020-07-08T17:07:16.162+02:00 }
2020-07-08T17:07:16.162+02:00 INFO:main:using default model name:
model2020-07-08T17:07:16.162+02:00 INFO:main:tensorflow serving model
config:2020-07-08T17:07:16.163+02:00 model_config_list: { config: { name:
"model", base_path: "/opt/ml/model", model_platform: "tensorflow" }2020-07-08T17:07:16.163+02:00 }
2020-07-08T17:07:16.163+02:00 INFO:main:tensorflow version info:
2020-07-08T17:07:16.163+02:00 TensorFlow ModelServer:
2.1.0-rc1+dev.sha.075ffcf2020-07-08T17:07:16.163+02:00 TensorFlow Library: 2.1.0
2020-07-08T17:07:16.163+02:00 INFO:main:tensorflow serving
command: tensorflow_model_server –port=10000 –rest_api_port=10001
–model_config_file=/sagemaker/model-config.cfg –max_num_load_retries=02020-07-08T17:07:16.163+02:00 INFO:main:started tensorflow serving
(pid: 13)2020-07-08T17:07:16.163+02:00 INFO:main:nginx version info:
2020-07-08T17:07:16.163+02:00 nginx version: nginx/1.18.0
2020-07-08T17:07:16.163+02:00 built by gcc 7.4.0 (Ubuntu
7.4.0-1ubuntu1~18.04.1)2020-07-08T17:07:16.163+02:00 built with OpenSSL 1.1.1 11 Sep 2018
2020-07-08T17:07:16.163+02:00 TLS SNI support enabled
2020-07-08T17:07:16.163+02:00 configure arguments: –prefix=/etc/nginx
–sbin-path=/usr/sbin/nginx –modules-path=/usr/lib/nginx/modules –conf-path=/etc/nginx/nginx.conf –error-log-path=/var/log/nginx/error.log –http-log-path=/var/log/nginx/access.log –pid-path=/var/run/nginx.pid –lock-path=/var/run/nginx.lock –http-client-body-temp-path=/var/cache/nginx/client_temp –http-proxy-temp-path=/var/cache/nginx/proxy_temp –http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp –http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp –http-scgi-temp-path=/var/cache/nginx/scgi_temp –user=nginx –group=nginx –with-compat –with-file-aio –with-threads –with-http_addition_module –with-http_auth_request_module –with-http_dav_module –with-http_flv_module –with-http_gunzip_module –with-http_gzip_static_module –with-http_mp4_module –with-http_random_index_module –with-http_realip_module –with-http_secure_link_module –with-http_slice_module –with-http_ssl_module –with-http_stub_status_module –with-http_sub_module –with-http_v2_module –with-mail –with-mail_ssl_module –with-stream –with-stream_realip_module –with-stream_ssl_module –with-stream_ssl_preread_module –with-cc-opt=’-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.18.0/debian/debuild-base/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC’ –with-ld-opt=’-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,–as-needed -pie’2020-07-08T17:07:16.163+02:00 INFO:main:started nginx (pid: 15)
2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.075708: I
tensorflow_serving/model_servers/server_core.cc:462] Adding/updating
models.2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.075760: I
tensorflow_serving/model_servers/server_core.cc:573] (Re-)adding
model: model2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180755: I
tensorflow_serving/util/retrier.cc:46] Retrying of Reserving resources
for servable: {name: model version: 1} exhausted max_num_retries: 02020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180887: I
tensorflow_serving/core/basic_manager.cc:739] Successfully reserved
resources to load servable {name: model version: 1}2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180919: I
tensorflow_serving/core/loader_harness.cc:66] Approving load for
servable version {name: model version: 1}2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180944: I
tensorflow_serving/core/loader_harness.cc:74] Loading servable version
{name: model version: 1}2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180995: I
external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /opt/ml/model/12020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.205712: I
external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }2020-07-08T17:07:16.164+02:00 2020-07-08 15:07:15.205825: I
external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /opt/ml/model/12020-07-08T17:07:16.164+02:00 2020-07-08 15:07:15.208599: I
external/org_tensorflow/tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using
inter_op_parallelism_threads for best performance.2020-07-08T17:07:16.164+02:00 2020-07-08 15:07:15.328057: I
external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.578796: I
external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path:
/opt/ml/model/12020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.626494: I
external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 1445495
microseconds.2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.630443: I
tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No
warmup data file found at
/opt/ml/model/1/assets.extra/tf_serving_warmup_requests2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.632461: I
tensorflow_serving/util/retrier.cc:46] Retrying of Loading servable:
{name: model version: 1} exhausted max_num_retries: 02020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.632484: I
tensorflow_serving/core/loader_harness.cc:87] Successfully loaded
servable version {name: model version: 1}2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.634727: I
tensorflow_serving/model_servers/server.cc:362] Running gRPC
ModelServer at 0.0.0.0:10000 …2020-07-08T17:07:17.165+02:00 [warn] getaddrinfo: address family for
nodename not supported2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.635747: I
tensorflow_serving/model_servers/server.cc:382] Exporting HTTP/REST
API at:localhost:10001 …2020-07-08T17:07:17.165+02:00 [evhttp_server.cc : 238] NET_LOG:
Entering the event loop …
But both (endpoint and batch transform) fail the Sagemaker Ping Health check with:
2020-07-08T17:07:32.169+02:00 2020/07/08 15:07:31 [error] 16#16: *1
js: failed ping{ "error": "Could not find any versions of model None"
}2020-07-08T17:07:32.170+02:00
169.254.255.130 – – [08/Jul/2020:15:07:31 +0000] "GET /ping HTTP/1.1" 502 157 "-" "Go-http-client/1.1"
Also, when tested locally with self built docker tf-serving-container the model is running without problems and can be queried with curl.
What could be the issue?
2
Answers
The solution to the problem is as follows:
Environmentvariable "SAGEMAKER_TFS_DEFAULT_MODEL_NAME" needs to be set to correct model name e.g. "model"
This creates the endpoint correctly and passes the ping health check with:
2020-07-16T12:08:20.654+02:00 10.32.0.2 - - [16/Jul/2020:10:08:20 +0000] "GET /ping HTTP/1.1" 200 0 "-" "AHC/2.0"
It looks as though your model is named
"model"
to TensorFlow Serving earlier in the logs:but in the error, the ping check is getting routed to TensorFlow Serving as a model named: `"None"
I’m not sure if this error is happening due to the Docker container or on the SageMaker side. But… I did find this suspicious environment variable
TFS_DEFAULT_MODEL_NAME
that is set to"None"
by default:Could you try setting
TFS_DEFAULT_MODEL_NAME
in your container and see what happens?If that doesn’t work, you might have some more success posting a bug on the TensorFlow SageMaker container github. Amazon experts check that fairly regularly.
BTW, I’d love to chat more about how you’re using SageMaker endpoints with TensorFlow models for some research I’m doing. If you’re up for it, shoot me an email at
[email protected]
.