Hi It seems I could not load data correctly from DBFS is Databricks using Auto loader, at least it is not displaying the data — ‘Query returned no results’. Any help is welcome!
3
I just should locate to the mnt folder and save and read my data from there.
Changing the path to '/mnt/data-lake/data/autoloader-test/' for exmaple, worked.
You need to remove /dbfs from your paths – this path is used for local file access, not for distributed access.
/dbfs
P.S. Also, don’t put the checkpoint & schema evolution folders into subfolders of your landing zone.
As Alex mentioned , /dbfs is your driver node local file . You can use following code.
bronzeDF = (spark.readStream .format("cloudFiles") .option("cloudFiles.format", "parquet") .option("cloudFiles.schemaLocation", var_schema_location) .load(var_data_location) #Your dbfs location .writeStream .option("checkpointLocation", var_checkpoint_location) .option("mergeSchema", "true") .trigger(once=True) .table(raw_logs_table_name) .awaitTermination()) bronzeDF = spark.table(var_raw_log_table)
Click here to cancel reply.
3
Answers
I just should locate to the mnt folder and save and read my data from there.
Changing the path to '/mnt/data-lake/data/autoloader-test/' for exmaple, worked.
You need to remove
/dbfs
from your paths – this path is used for local file access, not for distributed access.P.S. Also, don’t put the checkpoint & schema evolution folders into subfolders of your landing zone.
As Alex mentioned ,
/dbfs
is your driver node local file . You can use following code.