I’m also facing the same issue. If I’m deploying one container on one endpoint, then it works perfectly.
But when I’m trying to deploy the multiple containers on an endpoint. Then the serve file is not going inside the ping function, which means the container cannot ping.
Any Suggestions will be appreciatable
Thanks
2
Answers
The issue has been resolved. This is because in sagemaker multicontainer endpoint, it listens to that port only which is in the ['SAGEMAKER_BIND_TO_PORT'] environment variable. And when I print that port value from the['SAGEMAKER_BIND_TO_PORT'] environment variable each time on cloud watch I'm seeing the random port value.
So, I just write the code to replace the port value[8080] in nginx.conf file from the value which is inside the ['SAGEMAKER_BIND_TO_PORT'] environment variable.
And that's helped me to solve the same.
Thanks.
could you share/post your logs? Please make sure that your directory structure with your Dockerfile is following this format: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Multi-Container/mce-custom. Additionally make sure in your Dockerfile you are installing your dependencies properly and can build your image. Also make sure to check your predictor.py in the /ping health check route is correctly loading your model. Usually a health check error is a result of your model not being loaded properly, without this the endpoint will fail to create as a result. I would add logging statements in your predictor.py file in general to get a more accurate picture of what is happening with your inference code.