I’m trying to run an Azure ML Batch endpoint job, but the job always ends with an error because of the input (see below). I used a model created and trained in the Azure designer as described on the page: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-model-designer?view=azureml-api-1
Error from directory "logs/azureml/stderrlogs.txt" is like:
TypeError: the JSON object must be str, bytes or bytearray, not MiniBatch
My scoring script (auto-generated for model):
import os
import json
from typing import List
from azureml.studio.core.io.model_directory import ModelDirectory
from pathlib import Path
from azureml.studio.modules.ml.score.score_generic_module.score_generic_module import ScoreModelModule
from azureml.designer.serving.dagengine.converter import create_dfd_from_dict
from collections import defaultdict
from azureml.designer.serving.dagengine.utils import decode_nan
from azureml.studio.common.datatable.data_table import DataTable
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'trained_model_outputs')
schema_file_path = Path(model_path) / '_schema.json'
with open(schema_file_path) as fp:
schema_data = json.load(fp)
def init():
global model
model = ModelDirectory.load(model_path).model
def run(data):
data = json.loads(data)
input_entry = defaultdict(list)
for row in data:
for key, val in row.items():
input_entry[key].append(decode_nan(val))
data_frame_directory = create_dfd_from_dict(input_entry, schema_data)
score_module = ScoreModelModule()
result, = score_module.run(
learner=model,
test_data=DataTable.from_dfd(data_frame_directory),
append_or_result_only=True)
return json.dumps({"result": result.data_frame.values.tolist()})
definition of input:
input = Input(type=AssetTypes.URI_FILE, path="azureml://subscriptions/$$$$$$$$/resourcegroups/$$$$$$$$$/workspaces/$$$$$/datastores/workspaceblobstore/paths/UI/2023-08-24_193934_UTC/samples.json")
definition of job:
job = ml_client.batch_endpoints.invoke(
endpoint_name=endpoint.name,
input=input,
)
I’ve read/watched various tutorials/documentation and tried solutions from them, but nothing helped and I’ve been stuck with this error for several hours, so I’m asking for help.
2
Answers
The batch endpoint expects a json file but for some reason Azure adds a hidden file ".amlignore" to the URI_FOLDER where the minibatches were imported from which azure couldn't process and therefore threw errors - see my folder content below input:
Based on the error messages you provided, it appears that the required features are not provided in the input data for the CSV format.
To fix this issue, make sure that your input CSV file has the correct column headers that match the features expected by the model. You can refer to the training data used in the Azure designer to ensure that the column headers are consistent.
For more details, please refer to this tutorial on batch endpoint.
To get more details on supported data input format you can check this documentation.