skip to Main Content

I know very little about Java, but I have a Java question. I’ve got a Docker container for an image called amazon/aws-glue-libs. This lets me run and test AWS Glue ETL code locally on my Mac without having to use the AWS Console. It also lets me debug and single-step through the ETL code, which is fantastic. However, I hit a snag trying to use JDBC to connect to my RDS MySQL database in my sandbox. The JDBC code works if run in the AWS Glue Console, but dies with a big list of Java messages, the key one being the last line of this:

Traceback (most recent call last): File
"/opt/project/glue/etl/script.py", line 697, in
.load() File "/home/glue_user/spark/python/pyspark/sql/readwriter.py", line 184, in
load
return self._df(self._jreader.load()) File "/home/glue_user/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
line 1321, in call File
"/home/glue_user/spark/python/pyspark/sql/utils.py", line 190, in deco
return f(*a, **kw) File "/home/glue_user/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py",
line 326, in get_return_value py4j.protocol.Py4JJavaError: An error
occurred while calling o241.load. : java.lang.ClassNotFoundException:
com.mysql.cj.jdbc.Driver

Here is a sample of the kind of code I’m trying to run:

person_df = spark.read 
    .format("jdbc") 
    .option("url", JDBC_URL) 
    .option("dbtable", "person") 
    .option("user", USERNAME) 
    .option("password", PASSWORD) 
    .option("driver", "com.mysql.cj.jdbc.Driver") 
    .load()

I can get a bash shell inside the Docker container. Where should I look to find this class/driver/etc? Or what else should I be looking at to resolve this problem?(edited)

2

Answers


  1. Add the driver JAR to S3 and use –extra-jars job parameter to inject it.

    Login or Signup to reply.
  2. execute your command like this
    spark-submit --jars s3://S3BUCKET/jars/mysql-connector-j-8.0.32.jar SPARK_SCRIPT.py

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search