I’ve been going round in circles trying to write to a blob storage account in azure. Currently i’m creating a spark session with the following setup:
spark = SparkSession.builder
.appName("Azure Blob Storage Access")
.config("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.3.1
Which results in the following error:
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem$Secure not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
I’ve also tried installing the above jars on the following path
But end up with the same error being kicked back. I’m at a bit of a loss where to go with this. I’ve seen a couple of success stories running this code as either setting the jar parameter within the session config or by having them installed locally. How the hell do i get this working?
I’ve confirmed SPARK_HOME and JAVA_HOME paths in the .env With the jar files locally i’ve managed to get a different error now.
Py4JJavaError: An error occurred while calling o307.csv.
: java.lang.NoClassDefFoundError: org/eclipse/jetty/util/ajax/JSON$Convertor
Not sure this is any better but at least its different
Root cause finally identified as Jetty-Util and Jetty-Util-Ajax version being 11 instead of 9. Thanks to the answers above pushing I managed to confirm the azure and hadoop jars were not the issue. Eventually I came across this link about Jetty Util having deprecated classes in Jetty Util Version 10+. I dropped down to Jetty-Util and Jetty-Util-Ajax v9.4.45 from Maven, updated my JDK to Open-JDK 16 and successfully wrote to the Azure Storage Account.
Here’s how I did it: