I’m using the following code to do a sklearn transformation job in sagemaker:
region = boto3.session.Session().region_name
role = sagemaker.get_execution_role()
sklearn_processor = SKLearnProcessor(
framework_version="1.0-1", role=role,
instance_type="ml.m5.xlarge", instance_count=1,
# sagemaker_session = Session()
)
out_path = os.path.join(bucket, prefix, f'test_transform/data.csv')
sklearn_processor.run(
code="preprocess.py",
inputs = [
ProcessingInput(source = 'my_package/', destination = '/opt/ml/processing/input/code/my_package/')
],
outputs=[
ProcessingOutput(output_name="test_transform_data",
source = '/opt/ml/processing/output/test_transform',
destination = out_path),
],
arguments=["--time-slot-minutes", "30min"]
)
Within the above code, it’s running preprocess.py, and (within) preprocess.py loads the data from snowflake database using the credentials saved in aws secretsmanager:
region = boto3.Session().region_name
secrets_client = boto3.client(service_name='secretsmanager', region_name=region)
So here’s where error happen: first line above returns region as None, so the the second line of code raises botocore.exceptions.NoRegionError: You must specify a region
In this case, how can I pass the region to SKLearnProcessor or is there any other way to make the code working within the processing job instance?
FYI:
the source of input 'my_package/'
is in the structure below to install packages and include py dependencies used in preprocess.py
├── my_package
│ ├── file1.py
│ ├── file2.py
│ └── requirements.txt
└── preprocess.py
Thanks
2
Answers
set following in the code preprocess.py solved the issue:
The two easiest ways are to to either set it in your ~/.aws/config
or you can use an environment variable as in:
export AWS_DEFAULT_REGION=us-west-2
but you do need to tell boto3 which region to use.
Check out the documentation about how to input variables to boto3:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html