I’ve been going round in circles trying to write to a blob storage account in azure. Currently i’m creating a spark session with the following setup:
spark = SparkSession.builder
.appName("Azure Blob Storage Access")
.config("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.3.1
,com.microsoft.azure:azure-storage-blob:11.0.1
,org.apache.hadoop:hadoop-azure:3.4.0
,org.apache.hadoop:hadoop-azure:3.3.1
,org.eclipse.jetty:jetty-util:11.0.7
,org.apache.hadoop.thirdparty:hadoop-shaded-guava:1.1.1
,org.apache.httpcomponents:httpclient:4.5.13
,com.fasterxml.jackson.core:jackson-databind:2.13.1
,com.fasterxml.jackson.core:jackson-core:2.13.1
,org.eclipse.jetty:jetty-util-ajax:11.0.7
,org.apache.hadoop:hadoop-common:3.3.1
,com.microsoft.azure:azure-keyvault-core:1.2.6
")
.getOrCreate()
Which results in the following error:
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem$Secure not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
I’ve also tried installing the above jars on the following path
/opt/homebrew/Cellar/apache-spark/3.5.1/libexec/jars
But end up with the same error being kicked back. I’m at a bit of a loss where to go with this. I’ve seen a couple of success stories running this code as either setting the jar parameter within the session config or by having them installed locally. How the hell do i get this working?
UPDATE:
I’ve confirmed SPARK_HOME and JAVA_HOME paths in the .env With the jar files locally i’ve managed to get a different error now.
Py4JJavaError: An error occurred while calling o307.csv.
: java.lang.NoClassDefFoundError: org/eclipse/jetty/util/ajax/JSON$Convertor
Not sure this is any better but at least its different
2
Answers
Root cause finally identified as Jetty-Util and Jetty-Util-Ajax version being 11 instead of 9. Thanks to the answers above pushing I managed to confirm the azure and hadoop jars were not the issue. Eventually I came across this link about Jetty Util having deprecated classes in Jetty Util Version 10+. I dropped down to Jetty-Util and Jetty-Util-Ajax v9.4.45 from Maven, updated my JDK to Open-JDK 16 and successfully wrote to the Azure Storage Account.
Here’s how I did it: