skip to Main Content

I am creating glue job using boto3 create job script and trying to pass default argument value to path location to run different s3 bucket files.

Below script is sample code, which will create glue ETL job. How to pass parameters to sourcepath using args?

Sample script:

import boto3
import json
client = boto3.client('glue')
response = client.create_job(
   Name='jobname',
   Description='Glue Job',
   LogUri='s3://bucket/logs/',
   Role='arn:aws:iam::',
   ExecutionProperty={
       'MaxConcurrentRuns': 3
   },
   Command={
       'Name': 'glue',
       'ScriptLocation': 's3://bucketname/gluejob.py',
       'PythonVersion': '3'
   },
   MaxRetries = 1,
   Timeout=123,
   GlueVersion='3.0',
   NumberOfWorkers=2,
   WorkerType='G.1X',
   DefaultArguments = {'s3sourcepath':'s3://bucketname/csvfile.csv'}
   CodeGenConfigurationNodes = {
   'node-1':{
       'S3CsvSource': {
               'Name': 's3_source',
               'Paths': [
                   args['s3sourcepath'], ------ here how to pass default arguments 
               ],
               'Separator': 'comma',
               'QuoteChar': 'quote',
               'WithHeader': True,
               'WriteHeader': True
           }
)

Thanks in advance.

3

Answers


  1. You first need to retrieve the arguments that you have passed using getResolvedOptions. Something like this:

    import sys
    from awsglue.utils import getResolvedOptions
    
    args = getResolvedOptions(sys.argv, ['s3sourcepath'])
    

    Now you should be able to use
    args['s3sourcepath']

    You can read this for more info.

    Login or Signup to reply.
  2. Am not sure if this run-time parameters can be set while creating a Glue job. Can you try to set run-time parameters when you call start_job_run(). You can refer here here for code samples

    response = client.start_job_run(
               JobName = 'my_test_Job',
               Arguments = {
                 '--s3sourcepath':   's3 path',
                  } )
    
    Login or Signup to reply.
  3. In your code, the job command is given as glue.

    Command={
       'Name': 'glue',
       'ScriptLocation': 's3://bucketname/gluejob.py',
       'PythonVersion': '3'
    

    },

    But the documentation here says it should be glueetl

    Command={
           'Name': 'glueetl',
           'ScriptLocation': 's3://bucketname/gluejob.py',
           'PythonVersion': '3'
       },
    

    Can you try with

    'Name': 'glueetl'
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search