how do we copy files from Hadoop to abfs (azure blob file system)
I want to copy from Hadoop fs to abfs file system but it throws an error
this is the command I ran
hdfs dfs -ls abfs://….
ls: No FileSystem for scheme "abfs"
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem not found
any idea how this can be done ?
2
Answers
In the
core-site.xml
you need to add a config property forfs.abfs.impl
with valueorg.apache.hadoop.fs.azurebfs.AzureBlobFileSystem
, and then add any other related authentication configurations it may need.More details on installation/configuration here – https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html
the abfs binding is already in core-default.xml for any release with the abfs client present. however, the hadoop-azure jar and dependency is not in the hadoop common/lib dir where it is needed (it is in HDI, CDH, but not the apache one)
you can tell the hadoop script to pick it and its dependencies up by setting the
HADOOP_OPTIONAL_TOOLS
env var; you can do this in~/.hadoop-env
; just try on your command line firstafter doing that, download the latest cloudstore jar and use its storediag command to attempt to connect to an abfs URL; it’s the place to start debugging classpath and config issues
https://github.com/steveloughran/cloudstore