Need guidance on passing command line arguments for Sagemaker training job using Boto3 API. Please find my docker file
FROM public.ecr.aws/ubuntu/ubuntu:22.04
LABEL version="2.0"
RUN apt-get -y update && apt-get install -y --no-install-recommends
wget
build-essential
python3-dev
python3-pip
python3-setuptools
nginx
ca-certificates
&& rm -rf /var/lib/apt/lists/*
RUN python3.10 -m pip install pip --upgrade && pip install --upgrade cython
RUN ln -s /usr/bin/python3 /usr/bin/python
COPY requirements.txt .
RUN pip --no-cache-dir install -r requirements.txt
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/ml/code/:${PATH}"
ENV PYTHONPATH="/opt/ml/code/:${PYTHONPATH}"
COPY src/ /opt/ml/code/
WORKDIR /opt/ml/code/
ENTRYPOINT [ "python", "/opt/ml/code/entry_point.py" ]
The entry_point.py script is as below
parser = argparse.ArgumentParser()
parser.add_argument("--mode", type=str, required=True)
parser.add_argument("--region", type=int)
args = parser.parse_args()
if args.mode == "inference":
run_inference(args.region_id)
elif args.mode == "training":
run_training(args.region_id)
else:
raise ValueError(f"Unknown mode: {args.mode}")
The image has been published to AWS ECR. Now using boto3 API call as below to start the job
session = boto3.Session(profile_name='algoprod')
client = session.client('sagemaker', region_name='us-east-1')
training_job_name = 'sagemaker-training-demo'
resp = client.create_training_job(
TrainingJobName=training_job_name,
RoleArn="xxxx",
AlgorithmSpecification={
'TrainingImage': "image:latest",
'TrainingInputMode': "File",
'ContainerArguments': [
'--mode training',
'--region_id 1',
]
)
print(resp)
Above API call using boto3 successfully initiate the Sagemaker training in AWS but the training job is failing with following error message
entry_point.py: error: the following arguments are required: --mode
mode has been passed through ContainerArguments as per the guidance in Boto3 documentation https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_training_job.html
Please advice
2
Answers
Figured out how it should be passed.
Maybe the solution is as simple as putting
training
into quotes"training"
1
is understood as integer, buttraining
without quotes is interpreted as a variable.