I know very little about Java, but I have a Java question. I’ve got a Docker container for an image called amazon/aws-glue-libs. This lets me run and test AWS Glue ETL code locally on my Mac without having to use the AWS Console. It also lets me debug and single-step through the ETL code, which is fantastic. However, I hit a snag trying to use JDBC to connect to my RDS MySQL database in my sandbox. The JDBC code works if run in the AWS Glue Console, but dies with a big list of Java messages, the key one being the last line of this:
Traceback (most recent call last): File
"/opt/project/glue/etl/script.py", line 697, in
.load() File "/home/glue_user/spark/python/pyspark/sql/readwriter.py", line 184, in
load
return self._df(self._jreader.load()) File "/home/glue_user/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
line 1321, in call File
"/home/glue_user/spark/python/pyspark/sql/utils.py", line 190, in deco
return f(*a, **kw) File "/home/glue_user/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py",
line 326, in get_return_value py4j.protocol.Py4JJavaError: An error
occurred while calling o241.load. : java.lang.ClassNotFoundException:
com.mysql.cj.jdbc.Driver
Here is a sample of the kind of code I’m trying to run:
person_df = spark.read
.format("jdbc")
.option("url", JDBC_URL)
.option("dbtable", "person")
.option("user", USERNAME)
.option("password", PASSWORD)
.option("driver", "com.mysql.cj.jdbc.Driver")
.load()
I can get a bash shell inside the Docker container. Where should I look to find this class/driver/etc? Or what else should I be looking at to resolve this problem?(edited)
2
Answers
Add the driver JAR to S3 and use –extra-jars job parameter to inject it.
execute your command like this
spark-submit --jars s3://S3BUCKET/jars/mysql-connector-j-8.0.32.jar SPARK_SCRIPT.py