skip to Main Content

I have created a Glue catalog in my account. It has 1 DB and 1 table.
Screenshot of Glue catalog from AWS console

I followed this guide from AWS and created my EMR cluster. However, when I run spark-shell and try to access Glue catalog, I am not able to see the database from Glue catalog being accessed in my EMR.
Screenshot of terminal showing spark-shell

What am I missing?

2

Answers


  1. Chosen as BEST ANSWER

    This was a non issue. I was trying to launch and EMR in US-East-1, and for some reason, the EMR was not getting provisioned even if the underlying EC2's were provisioned and in running state. I was able to ssh to the EC2s and run spark-shell on them too.

    I launched an EMR in US-East-2 and it was completely provisioned. I was able to connect to the Glue catalog successfully.


  2. It doesn’t look like Spark is using the Glue DataCatalog in your cluster. Did you enable the Glue catalog option for Spark when creating the cluster? For existing cluster, you can check the cluster Configuration in Console. It should have something like this:

    [
      {
        "Classification": "spark-hive-site",
        "Properties": {
          "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
        }
      }
    ]
    

    If your cluster has above config set, and Spark is still unable to fetch info from Glue catalog, you may want to enable DEBUG level logging in Spark for more details.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search