skip to Main Content

I have an Excel file in the azure datalake ,I have read the excel file like the following

ddff=spark.read.format("com.crealytics.spark.excel") 
               .option("header", "true")
               .option("sheetName","__all__")
               .option("inferSchema","true")
               .load("abfss://[email protected]/file.xlsx")

Now Iam confused how to get just the sheetnames from that Excel file,is there any direct function to do that ?

2

Answers


  1. As per the github link for spark-excel.. the following code should work – please try it…Code directly taken from the github page.

    import com.crealytics.spark.excel.WorkbookReader
    val sheetNames = WorkbookReader( Map("path" -> "Worktime.xlsx")
                                   , spark.sparkContext.hadoopConfiguration
                                   ).sheetNames
    val df = spark.read.excel(
      header = true,
      dataAddress = sheetNames(0)
    )
    
    Login or Signup to reply.
  2. I reproduce the same in my environment with sample data. I got this output.

    Please follow the below steps. To resolve this issue :

    Step1: Mount your storage account with Azure data bricks:

    dbutils.fs.mount(
        source = "wasbs://<container_name>@<stoarge_account_name>.blob.core.windows.net/",
        mount_point = "/mnt/<mount_path>",
        extra_configs = {"fs.azure.account.key.<stoarge_account_name>.blob.core.windows.net":"<Access_key>"})
    

    Step 2: Use dbutils.fs.ls to access the whole mount path:

    dbutils.fs.ls("/mnt/<mount_path>")
    

    enter image description here

    Note: Use "/dbfs", instead of "dbfs:": The function expects a local file path

    For example, This is my actual file path: dbfs:/mnt/dem/read-employees-csv.xlsx. I modified the file path like this /dbfs/mnt/dem/read-employees-csv.xlsx. And also make sure to install openpyxl and fsspec . Use this code to install packages : pip install openpyxland pip install fsspec

    Now, execute the below code. You will get the list of sheets:

    Code:

    import pandas
    x1 = pandas.ExcelFile('/dbfs/mnt/dem/read-employees-csv.xlsx')
    print(x1.sheet_names)
    

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search