I am creating glue job using boto3 create job script and trying to pass default argument value to path location to run different s3 bucket files.
Below script is sample code, which will create glue ETL job. How to pass parameters to sourcepath using args?
Sample script:
import boto3
import json
client = boto3.client('glue')
response = client.create_job(
Name='jobname',
Description='Glue Job',
LogUri='s3://bucket/logs/',
Role='arn:aws:iam::',
ExecutionProperty={
'MaxConcurrentRuns': 3
},
Command={
'Name': 'glue',
'ScriptLocation': 's3://bucketname/gluejob.py',
'PythonVersion': '3'
},
MaxRetries = 1,
Timeout=123,
GlueVersion='3.0',
NumberOfWorkers=2,
WorkerType='G.1X',
DefaultArguments = {'s3sourcepath':'s3://bucketname/csvfile.csv'}
CodeGenConfigurationNodes = {
'node-1':{
'S3CsvSource': {
'Name': 's3_source',
'Paths': [
args['s3sourcepath'], ------ here how to pass default arguments
],
'Separator': 'comma',
'QuoteChar': 'quote',
'WithHeader': True,
'WriteHeader': True
}
)
Thanks in advance.
3
Answers
You first need to retrieve the arguments that you have passed using getResolvedOptions. Something like this:
Now you should be able to use
args['s3sourcepath']
You can read this for more info.
Am not sure if this run-time parameters can be set while creating a Glue job. Can you try to set run-time parameters when you call start_job_run(). You can refer here here for code samples
In your code, the job command is given as
glue
.},
But the documentation here says it should be
glueetl
Can you try with