skip to Main Content

We want to deploy a trained Tensorflow Model to AWS Sagemaker for inference with a tensorflow-serving-container. Tensorflow version is 2.1. Following the guide at https://github.com/aws/sagemaker-tensorflow-serving-container the following steps have been taken:

  1. Build TF 2.1 AMI and publish it to AWS ECR after sucessful local testing
  2. Setting Sagemaker Execution Role Permissions for S3 and ECR.
  3. Pack saved TF model folder (saved_model.pb, assets, variables) into model.tar.gz
  4. Created endpoint with realtime predictor:
import os
import sagemaker
from sagemaker.tensorflow.serving import Model
from sagemaker.tensorflow.model import TensorFlowModel
from sagemaker.predictor import json_deserializer, json_serializer, RealTimePredictor
from sagemaker.content_types import CONTENT_TYPE_JSON

def create_tfs_sagemaker_model():
    sagemaker_session = sagemaker.Session()
    role = 'arn:aws:iam::XXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXX
    bucket = 'tf-serving'
    prefix = 'sagemaker/tfs-test'
    s3_path = 's3://{}/{}'.format(bucket, prefix)
    image = 'XXXXXXXX.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-tensorflow-serving:2.1.0-cpu'
    model_data = sagemaker_session.upload_data('model.tar.gz', bucket, os.path.join(prefix, 'model'))
    endpoint_name = 'tf-serving-ep-test-1'
    tensorflow_serving_model = Model(model_data=model_data, role=role, sagemaker_session=sagemaker_session, image=image, framework_version='2.1')
    tensorflow_serving_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)
    rt_predictor = RealTimePredictor(endpoint=endpoint_name, sagemaker_session=sagemaker_session, serializer=json_serializer, content_type=CONTENT_TYPE_JSON, accept=CONTENT_TYPE_JSON)
  1. Create batch-transform job:
def create_tfs_sagemaker_batch_transform():
    sagemaker_session = sagemaker.Session()
    print(sagemaker_session.boto_region_name)
    role = 'arn:aws:iam::XXXXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXXX'
    bucket = 'XXXXXXX-tf-serving'
    prefix = 'sagemaker/tfs-test'
    image = 'XXXXXXXXXX.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-tensorflow-serving:2.1.0-cpu'
    s3_path = 's3://{}/{}'.format(bucket, prefix)
    model_data = sagemaker_session.upload_data('model.tar.gz', bucket, os.path.join(prefix, 'model'))
    tensorflow_serving_model = Model(model_data=model_data, role=role, sagemaker_session=sagemaker_session, image=image, name='deep-net-0', framework_version='2.1')
    print(tensorflow_serving_model.model_data)
    out_path = 's3://XXXXXX-serving-out/'
    input_path = "s3://XXXXXX-serving-in/"    
    tensorflow_serving_transformer = tensorflow_serving_model.transformer(instance_count=1, instance_type='ml.c4.xlarge', accept='application/json', output_path=out_path)
    tensorflow_serving_transformer.transform(input_path, content_type='application/json')

Both steps 4 and 5 are running and in the AWS Cloudwatch logs we see successful starting of the instances, loading of the model and TF-Serving entering the event loop – see below:

2020-07-08T17:07:16.156+02:00 INFO:main:starting services

2020-07-08T17:07:16.156+02:00 INFO:main:nginx config:

2020-07-08T17:07:16.156+02:00 load_module
modules/ngx_http_js_module.so;

2020-07-08T17:07:16.156+02:00 worker_processes auto;

2020-07-08T17:07:16.156+02:00 daemon off;

2020-07-08T17:07:16.156+02:00 pid /tmp/nginx.pid;

2020-07-08T17:07:16.157+02:00 error_log /dev/stderr error;

2020-07-08T17:07:16.157+02:00 worker_rlimit_nofile 4096;

2020-07-08T17:07:16.157+02:00 events { worker_connections 2048;

2020-07-08T17:07:16.157+02:00 }

2020-07-08T17:07:16.162+02:00 http { include /etc/nginx/mime.types;
default_type application/json; access_log /dev/stdout combined;
js_include tensorflow-serving.js; upstream tfs_upstream { server
localhost:10001; } upstream gunicorn_upstream { server
unix:/tmp/gunicorn.sock fail_timeout=1; } server { listen 8080
deferred; client_max_body_size 0; client_body_buffer_size 100m;
subrequest_output_buffer_size 100m; set $tfs_version 2.1; set
$default_tfs_model None; location /tfs { rewrite ^/tfs/(.*) /$1 break;
proxy_redirect off; proxy_pass_request_headers off; proxy_set_header
Content-Type ‘application/json’; proxy_set_header Accept
‘application/json’; proxy_pass http://tfs_upstream; } location /ping {
js_content ping; } location /invocations { js_content invocations; }
location /models { proxy_pass http://gunicorn_upstream/models; }
location / { return 404 ‘{"error": "Not Found"}’; } keepalive_timeout
3; }

2020-07-08T17:07:16.162+02:00 }

2020-07-08T17:07:16.162+02:00 INFO:tfs_utils:using default model name:
model

2020-07-08T17:07:16.162+02:00 INFO:tfs_utils:tensorflow serving model
config:

2020-07-08T17:07:16.162+02:00 model_config_list: { config: { name:
"model", base_path: "/opt/ml/model", model_platform: "tensorflow" }

2020-07-08T17:07:16.162+02:00 }

2020-07-08T17:07:16.162+02:00 INFO:main:using default model name:
model

2020-07-08T17:07:16.162+02:00 INFO:main:tensorflow serving model
config:

2020-07-08T17:07:16.163+02:00 model_config_list: { config: { name:
"model", base_path: "/opt/ml/model", model_platform: "tensorflow" }

2020-07-08T17:07:16.163+02:00 }

2020-07-08T17:07:16.163+02:00 INFO:main:tensorflow version info:

2020-07-08T17:07:16.163+02:00 TensorFlow ModelServer:
2.1.0-rc1+dev.sha.075ffcf

2020-07-08T17:07:16.163+02:00 TensorFlow Library: 2.1.0

2020-07-08T17:07:16.163+02:00 INFO:main:tensorflow serving
command: tensorflow_model_server –port=10000 –rest_api_port=10001
–model_config_file=/sagemaker/model-config.cfg –max_num_load_retries=0

2020-07-08T17:07:16.163+02:00 INFO:main:started tensorflow serving
(pid: 13)

2020-07-08T17:07:16.163+02:00 INFO:main:nginx version info:

2020-07-08T17:07:16.163+02:00 nginx version: nginx/1.18.0

2020-07-08T17:07:16.163+02:00 built by gcc 7.4.0 (Ubuntu
7.4.0-1ubuntu1~18.04.1)

2020-07-08T17:07:16.163+02:00 built with OpenSSL 1.1.1 11 Sep 2018

2020-07-08T17:07:16.163+02:00 TLS SNI support enabled

2020-07-08T17:07:16.163+02:00 configure arguments: –prefix=/etc/nginx
–sbin-path=/usr/sbin/nginx –modules-path=/usr/lib/nginx/modules –conf-path=/etc/nginx/nginx.conf –error-log-path=/var/log/nginx/error.log –http-log-path=/var/log/nginx/access.log –pid-path=/var/run/nginx.pid –lock-path=/var/run/nginx.lock –http-client-body-temp-path=/var/cache/nginx/client_temp –http-proxy-temp-path=/var/cache/nginx/proxy_temp –http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp –http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp –http-scgi-temp-path=/var/cache/nginx/scgi_temp –user=nginx –group=nginx –with-compat –with-file-aio –with-threads –with-http_addition_module –with-http_auth_request_module –with-http_dav_module –with-http_flv_module –with-http_gunzip_module –with-http_gzip_static_module –with-http_mp4_module –with-http_random_index_module –with-http_realip_module –with-http_secure_link_module –with-http_slice_module –with-http_ssl_module –with-http_stub_status_module –with-http_sub_module –with-http_v2_module –with-mail –with-mail_ssl_module –with-stream –with-stream_realip_module –with-stream_ssl_module –with-stream_ssl_preread_module –with-cc-opt=’-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.18.0/debian/debuild-base/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC’ –with-ld-opt=’-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,–as-needed -pie’

2020-07-08T17:07:16.163+02:00 INFO:main:started nginx (pid: 15)

2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.075708: I
tensorflow_serving/model_servers/server_core.cc:462] Adding/updating
models.

2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.075760: I
tensorflow_serving/model_servers/server_core.cc:573] (Re-)adding
model: model

2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180755: I
tensorflow_serving/util/retrier.cc:46] Retrying of Reserving resources
for servable: {name: model version: 1} exhausted max_num_retries: 0

2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180887: I
tensorflow_serving/core/basic_manager.cc:739] Successfully reserved
resources to load servable {name: model version: 1}

2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180919: I
tensorflow_serving/core/loader_harness.cc:66] Approving load for
servable version {name: model version: 1}

2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180944: I
tensorflow_serving/core/loader_harness.cc:74] Loading servable version
{name: model version: 1}

2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180995: I
external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /opt/ml/model/1

2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.205712: I
external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }

2020-07-08T17:07:16.164+02:00 2020-07-08 15:07:15.205825: I
external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /opt/ml/model/1

2020-07-08T17:07:16.164+02:00 2020-07-08 15:07:15.208599: I
external/org_tensorflow/tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using
inter_op_parallelism_threads for best performance.

2020-07-08T17:07:16.164+02:00 2020-07-08 15:07:15.328057: I
external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.

2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.578796: I
external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path:
/opt/ml/model/1

2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.626494: I
external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 1445495
microseconds.

2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.630443: I
tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No
warmup data file found at
/opt/ml/model/1/assets.extra/tf_serving_warmup_requests

2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.632461: I
tensorflow_serving/util/retrier.cc:46] Retrying of Loading servable:
{name: model version: 1} exhausted max_num_retries: 0

2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.632484: I
tensorflow_serving/core/loader_harness.cc:87] Successfully loaded
servable version {name: model version: 1}

2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.634727: I
tensorflow_serving/model_servers/server.cc:362] Running gRPC
ModelServer at 0.0.0.0:10000 …

2020-07-08T17:07:17.165+02:00 [warn] getaddrinfo: address family for
nodename not supported

2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.635747: I
tensorflow_serving/model_servers/server.cc:382] Exporting HTTP/REST
API at:localhost:10001 …

2020-07-08T17:07:17.165+02:00 [evhttp_server.cc : 238] NET_LOG:
Entering the event loop …

But both (endpoint and batch transform) fail the Sagemaker Ping Health check with:

2020-07-08T17:07:32.169+02:00 2020/07/08 15:07:31 [error] 16#16: *1
js: failed ping{ "error": "Could not find any versions of model None"
}

2020-07-08T17:07:32.170+02:00
169.254.255.130 – – [08/Jul/2020:15:07:31 +0000] "GET /ping HTTP/1.1" 502 157 "-" "Go-http-client/1.1"

Also, when tested locally with self built docker tf-serving-container the model is running without problems and can be queried with curl.
What could be the issue?

2

Answers


  1. Chosen as BEST ANSWER

    The solution to the problem is as follows:

    Environmentvariable "SAGEMAKER_TFS_DEFAULT_MODEL_NAME" needs to be set to correct model name e.g. "model"

    import os
    import sagemaker
    from sagemaker.tensorflow.serving import Model
    from sagemaker.tensorflow.model import TensorFlowModel
    from sagemaker.predictor import json_deserializer, json_serializer, RealTimePredictor
    from sagemaker.content_types import CONTENT_TYPE_JSON
    
    def create_tfs_sagemaker_model():
        sagemaker_session = sagemaker.Session()
        role = 'arn:aws:iam::XXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXX
        bucket = 'tf-serving'
        prefix = 'sagemaker/tfs-test'
        s3_path = 's3://{}/{}'.format(bucket, prefix)
        image = 'XXXXXXXX.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-tensorflow-serving:2.1.0-cpu'
        model_data = sagemaker_session.upload_data('model.tar.gz', bucket, os.path.join(prefix, 'model'))
        endpoint_name = 'tf-serving-ep-test-1'
        env = {"SAGEMAKER_TFS_DEFAULT_MODEL_NAME": "model"}
        tensorflow_serving_model = Model(model_data=model_data, role=role, sagemaker_session=sagemaker_session, image=image, name='model', framework_version='2.1', env=env)
        tensorflow_serving_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)
        rt_predictor = RealTimePredictor(endpoint=endpoint_name, sagemaker_session=sagemaker_session, serializer=json_serializer, content_type=CONTENT_TYPE_JSON, accept=CONTENT_TYPE_JSON)
    

    This creates the endpoint correctly and passes the ping health check with:

    2020-07-16T12:08:20.654+02:00 10.32.0.2 - - [16/Jul/2020:10:08:20 +0000] "GET /ping HTTP/1.1" 200 0 "-" "AHC/2.0"


  2. It looks as though your model is named "model" to TensorFlow Serving earlier in the logs:

    2020-07-08T17:07:16.162+02:00 INFO:main:using default model name: model
    2020-07-08T17:07:16.162+02:00 INFO:main:tensorflow serving model config:
    

    but in the error, the ping check is getting routed to TensorFlow Serving as a model named: `"None"

    `Could not find any versions of model None`
    

    I’m not sure if this error is happening due to the Docker container or on the SageMaker side. But… I did find this suspicious environment variable TFS_DEFAULT_MODEL_NAME that is set to "None" by default:

    class PythonServiceResource:
    
        def __init__(self):
            if SAGEMAKER_MULTI_MODEL_ENABLED:
                self._model_tfs_rest_port = {}
                self._model_tfs_grpc_port = {}
                self._model_tfs_pid = {}
                self._tfs_ports = self._parse_sagemaker_port_range(SAGEMAKER_TFS_PORT_RANGE)
            else:
                self._tfs_grpc_port = TFS_GRPC_PORT
                self._tfs_rest_port = TFS_REST_PORT
    
            self._tfs_enable_batching = SAGEMAKER_BATCHING_ENABLED == 'true'
            self._tfs_default_model_name = os.environ.get('TFS_DEFAULT_MODEL_NAME', "None")
    

    Could you try setting TFS_DEFAULT_MODEL_NAME in your container and see what happens?

    If that doesn’t work, you might have some more success posting a bug on the TensorFlow SageMaker container github. Amazon experts check that fairly regularly.

    BTW, I’d love to chat more about how you’re using SageMaker endpoints with TensorFlow models for some research I’m doing. If you’re up for it, shoot me an email at [email protected].

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search