**I’m trying to pass some arguments to run my pyspark script by the parameter of boto3 (emr-serverless client) EntryPointArguments, however, it doesn’t work at all, I would like to know if I’m doing it the right way.
**
**my python code is like this:**
`
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-env', nargs='?', metavar='Environment', type=str,
help='String: Environment to run. Options: [dev, prd]',
choices=['dev', 'prd'],
required=True,
default="prd")
# Capture args
args = parser.parse_args()
env = args.env
print(f"HELLO WOLRD FROM {env}")`
**and my script that runs emr-serverless looks like this:**
jobDriver={
"sparkSubmit": {
"entryPoint": "s3://example-bucket-us-east-1-codes-prd/hello_world.py",
"entryPointArguments": ["-env prd"],
"sparkSubmitParameters":
"--conf spark.executor.cores=2
--conf spark.executor.memory=4g
--conf spark.driver.cores=2
--conf spark.driver.memory=8g
--conf spark.executor.instances=1
--conf spark.dynamicAllocation.maxExecutors=12
",
}
**I've already tried putting single quotes, double quotes, I've tried to pass along these parameters in the "sparkSubmitParameters" and so far, nothing works, there aren't many examples of how to do this on the internet, so my hope is that someone has already done it, and achieved, thank you!**
2
Answers
I was testing it out, and I ended up figuring out how to do this. From what I understand, when it's a param like this:
you have to pass in the EntryPointArguments like this:
separating the arg, then passing the value, each one separately.
To pass some parameters into the application there should be a configuration specified in the sparkSubmit part of the command named entryPointArguments.
Below I pasted a full AWS CLI command for EMR Serverless application to run a job, passing named arguments into a python script containing pySpark code. Additional parameters in Spark Submit part of the command let to pass packages (utilities.zip) and jar files (JDBC_Driver.jar) to Spark executors in order to allow the application using it.
–execution-role-arn value should come from IAM, –application-id is EMR Serverless application (must be created beforehand) which will run the job .