Azure - How to extract Sheet names from Excel file using "com.crealytics.spark.excel" in Databricks (PySpark)

AzSuryaTeja
April 5, 2023
333 views
0 votes
2 Answers

I have an Excel file in the azure datalake ,I have read the excel file like the following

ddff=spark.read.format("com.crealytics.spark.excel") 
               .option("header", "true")
               .option("sheetName","__all__")
               .option("inferSchema","true")
               .load("abfss://[email protected]/file.xlsx")

Now Iam confused how to get just the sheetnames from that Excel file,is there any direct function to do that ?

Answers

- rainingdistros
- April 6, 2023 at 8:37 am
- 0 votes
0
As per the github link for spark-excel.. the following code should work – please try it…Code directly taken from the github page.
```
import com.crealytics.spark.excel.WorkbookReader
val sheetNames = WorkbookReader( Map("path" -> "Worktime.xlsx")
                               , spark.sparkContext.hadoopConfiguration
                               ).sheetNames
val df = spark.read.excel(
  header = true,
  dataAddress = sheetNames(0)
)
```
Login or Signup to reply.

- SaiVamsi
- April 6, 2023 at 10:24 am
- 0 votes
0
I reproduce the same in my environment with sample data. I got this output.

Please follow the below steps. To resolve this issue :

Step1: Mount your storage account with Azure data bricks:
```
dbutils.fs.mount(
    source = "wasbs://<container_name>@<stoarge_account_name>.blob.core.windows.net/",
    mount_point = "/mnt/<mount_path>",
    extra_configs = {"fs.azure.account.key.<stoarge_account_name>.blob.core.windows.net":"<Access_key>"})
```
Step 2: Use dbutils.fs.ls to access the whole mount path:
```
dbutils.fs.ls("/mnt/<mount_path>")
```
Note: Use "/dbfs", instead of "dbfs:": The function expects a local file path

For example, This is my actual file path: dbfs:/mnt/dem/read-employees-csv.xlsx. I modified the file path like this /dbfs/mnt/dem/read-employees-csv.xlsx. And also make sure to install openpyxl and fsspec . Use this code to install packages : pip install openpyxland pip install fsspec

Now, execute the below code. You will get the list of sheets:

Code:
```
import pandas
x1 = pandas.ExcelFile('/dbfs/mnt/dem/read-employees-csv.xlsx')
print(x1.sheet_names)
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Azure – How to extract Sheet names from Excel file using "com.crealytics.spark.excel" in Databricks (PySpark)

Answers