I have created a Glue catalog in my account. It has 1 DB and 1 table.
I followed this guide from AWS and created my EMR cluster. However, when I run spark-shell
and try to access Glue catalog, I am not able to see the database from Glue catalog being accessed in my EMR.
What am I missing?
2
Answers
This was a non issue. I was trying to launch and EMR in US-East-1, and for some reason, the EMR was not getting provisioned even if the underlying EC2's were provisioned and in running state. I was able to ssh to the EC2s and run spark-shell on them too.
I launched an EMR in US-East-2 and it was completely provisioned. I was able to connect to the Glue catalog successfully.
It doesn’t look like Spark is using the Glue DataCatalog in your cluster. Did you enable the Glue catalog option for Spark when creating the cluster? For existing cluster, you can check the cluster Configuration in Console. It should have something like this:
If your cluster has above config set, and Spark is still unable to fetch info from Glue catalog, you may want to enable DEBUG level logging in Spark for more details.