skip to Main Content

I’m using a notebook on Azure Databricks, this notebook is in my user repo. I want to write a csv file created by this notebook in this repo.
When i’m using the code below :

df_pandas.to_csv(‘test.csv’, index=False, header=False)

There is no error but the file is not written is the notebook’s repo.

Does someone has a clue ?

I’ve tried to write the complete path or just the csv name but the it still the same error :

Cannot save file into a non-existent directory: ‘/Users/*********/repo_one/repo_two’

2

Answers


  1. Chosen as BEST ANSWER

    Hi thanks for the explanation, But do you know how to write the csv file not in dbfs path but where I can retrieve it here, in the Workspace folder where my notebook is :

    enter image description here

    Thanks for the help again !


    • I created a sample Pandas dataframe df_pandas with two columns "name" and "age" and three rows of data. Then,I am writing the Spark dataframe to a CSV file named "test.csv" in the Databricks file
      system (DBFS)
    • The toPandas() method is used to convert the Spark dataframe to a Pandas dataframe, and the to_csv() method is used to convert the Pandas dataframe to a CSV string. The dbutils.fs.put() method is used to write the CSV string to the specified file path in DBFS.

    The below is the code:

    from pyspark.sql.types import *
    import pandas as pd
    data = {'name': ['John', 'Jane', 'Bob'], 'age': [25, 30, 35]}
    df_pandas = pd.DataFrame(data)
    df_spark = spark.createDataFrame(df_pandas)
    dbutils.fs.put("/Users/Dilip/repo_one/repo_two/test.csv", df_spark.toPandas().to_csv(index=False, header=False), True)
    

    enter image description here

    enter image description here

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search