skip to Main Content

Need guidance on passing command line arguments for Sagemaker training job using Boto3 API. Please find my docker file

FROM public.ecr.aws/ubuntu/ubuntu:22.04

LABEL version="2.0"

RUN apt-get -y update && apt-get install -y --no-install-recommends 
         wget 
         build-essential 
         python3-dev 
         python3-pip 
         python3-setuptools 
         nginx 
         ca-certificates 
    && rm -rf /var/lib/apt/lists/*
RUN python3.10 -m pip install pip --upgrade && pip install --upgrade cython
RUN ln -s /usr/bin/python3 /usr/bin/python
COPY requirements.txt .
RUN pip --no-cache-dir install -r requirements.txt

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/ml/code/:${PATH}"
ENV PYTHONPATH="/opt/ml/code/:${PYTHONPATH}"

COPY src/ /opt/ml/code/
WORKDIR /opt/ml/code/

ENTRYPOINT [ "python", "/opt/ml/code/entry_point.py" ]

The entry_point.py script is as below

parser = argparse.ArgumentParser()
parser.add_argument("--mode", type=str, required=True)
parser.add_argument("--region", type=int)

args = parser.parse_args()

if args.mode == "inference":
        run_inference(args.region_id)
    elif args.mode == "training":
        run_training(args.region_id)
    else:
        raise ValueError(f"Unknown mode: {args.mode}")

The image has been published to AWS ECR. Now using boto3 API call as below to start the job

session = boto3.Session(profile_name='algoprod')
client = session.client('sagemaker', region_name='us-east-1')
training_job_name = 'sagemaker-training-demo'
resp = client.create_training_job(
                    TrainingJobName=training_job_name,
                    RoleArn="xxxx",
                    AlgorithmSpecification={
                            'TrainingImage': "image:latest",
                            'TrainingInputMode': "File",
                            'ContainerArguments': [
                                    '--mode training',
                                    '--region_id 1',
                             ]
    )

    print(resp)

Above API call using boto3 successfully initiate the Sagemaker training in AWS but the training job is failing with following error message

entry_point.py: error: the following arguments are required: --mode

mode has been passed through ContainerArguments as per the guidance in Boto3 documentation https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_training_job.html

Please advice

2

Answers


  1. Chosen as BEST ANSWER

    Figured out how it should be passed.

    'ContainerArguments': ['--mode', 'training','--region_id', 1]
    

  2. Maybe the solution is as simple as putting training into quotes "training"

    'ContainerArguments': ['--mode "training"',
                           '--region_id 1',]
    

    1 is understood as integer, but training without quotes is interpreted as a variable.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search