I’m trying to deploy my custom-trained model using a custom-container, i.e. create an endpoint from a model that I created.
I’m doing the same thing with AI Platform (same model & container) and it works fine there.
At the first try I deployed the model successfully, but ever since whenever I try to create an endpoint it says "deploying" for 1+ hours and then it fails with the following error:
google.api_core.exceptions.FailedPrecondition: 400 Error: model server never became ready. Please validate that your model file or container configuration are valid. Model server logs can be found at (link)
The log shows the following:
* Running on all addresses (0.0.0.0)
WARNING: This is a development server. Do not use it in a production deployment.
* Running on http://127.0.0.1:8080
[05/Jul/2022 12:00:37] "[33mGET /v1/endpoints/1/deployedModels/2025850174177280000 HTTP/1.1[0m" 404 -
[05/Jul/2022 12:00:38] "[33mGET /v1/endpoints/1/deployedModels/2025850174177280000 HTTP/1.1[0m" 404 -
Where the last line is being spammed until it ultimately fails.
My flask app is as follows:
import base64
import os.path
import pickle
from typing import Dict, Any
from flask import Flask, request, jsonify
from streamliner.models.general_model import GeneralModel
class Predictor:
def __init__(self, model: GeneralModel):
self._model = model
def predict(self, instance: str) -> Dict[str, Any]:
decoded_pickle = base64.b64decode(instance)
features_df = pickle.loads(decoded_pickle)
prediction = self._model.predict(features_df).tolist()
return {"prediction": prediction}
app = Flask(__name__)
with open('./model.pkl', 'rb') as model_file:
model = pickle.load(model_file)
predictor = Predictor(model=model)
@app.route("/predict", methods=['POST'])
def predict() -> Any:
if request.method == "POST":
instance = request.get_json()
instance = instance['instances'][0]
predictions = predictor.predict(instance)
return jsonify(predictions)
@app.route("/health")
def health() -> str:
return "ok"
if __name__ == '__main__':
port = int(os.environ.get("PORT", 8080))
app.run(host='0.0.0.0', port=port)
The deployment code which I do through Python is irrelevant because the problem persists when I deploy through GCP’s UI.
The model creation code is as follows:
def upload_model(self):
model = {
"name": self.model_name_on_platform,
"display_name": self.model_name_on_platform,
"version_aliases": ["default", self.run_id],
"container_spec": {
"image_uri": f'{REGION}-docker.pkg.dev/{GCP_PROJECT_ID}/{self.repository_name}/{self.run_id}',
"predict_route": "/predict",
"health_route": "/health",
},
}
parent = self.model_service_client.common_location_path(project=GCP_PROJECT_ID, location=REGION)
model_path = self.model_service_client.model_path(project=GCP_PROJECT_ID,
location=REGION,
model=self.model_name_on_platform)
upload_model_request_specifications = {'parent': parent, 'model': model,
'model_id': self.model_name_on_platform}
try:
print("trying to get model")
self.get_model(model_path=model_path)
except NotFound:
print("didn't find model, creating a new one")
else:
print("found an existing model, creating a new version under it")
upload_model_request_specifications['parent_model'] = model_path
upload_model_request = model_service.UploadModelRequest(upload_model_request_specifications)
response = self.model_service_client.upload_model(request=upload_model_request, timeout=1800)
print("Long running operation:", response.operation.name)
upload_model_response = response.result(timeout=1800)
print("upload_model_response:", upload_model_response)
My problem is very close to this one with the difference that I do have a health check.
Why would it work on the first deployment and fail ever since? Why would it work on AI Platform but fail on Vertex AI?
2
Answers
This issue could be due to different reasons:
Validate the container configuration port, it should use port 8080.
This configuration is important because Vertex AI sends liveness
checks, health checks, and prediction requests to this port on the
container. You can see this document about containers, and this
other about custom containers.
Another possible reason is quota limits, which could need to be increased. You will be able to verify this using this document to do it
In the health and predict route use the MODEL_NAME you are using.
Like this example
Validate that the account you are using has enough permissions to
read your project’s GCS bucket.
Validate the Model location, should be the correct path.
If any of the suggestions above work, it’s a requirement to contact GCP Support by creating a Support Case to fix it. It’s impossible for the community to troubleshoot it without using internal GCP resources
In case you haven’t yet found a solution you can try out custom prediction routines. They are really helpful as they strip away the necessity to write the server part of the code and allows us to focus solely on the logic of our ml model and any kind of pre or post processing. Here is the link to help you out https://codelabs.developers.google.com/vertex-cpr-sklearn#0. Hope this helps.