skip to Main Content

Hi It seems I could not load data correctly from DBFS is Databricks using Auto loader, at least it is not displaying the data — ‘Query returned no results’. Any help is welcome!

enter image description here

3

Answers


  1. Chosen as BEST ANSWER

    I just should locate to the mnt folder and save and read my data from there.

    Changing the path to '/mnt/data-lake/data/autoloader-test/' for exmaple, worked.


  2. You need to remove /dbfs from your paths – this path is used for local file access, not for distributed access.

    P.S. Also, don’t put the checkpoint & schema evolution folders into subfolders of your landing zone.

    Login or Signup to reply.
  3. As Alex mentioned , /dbfs is your driver node local file . You can use following code.

    bronzeDF  = (spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "parquet")
    .option("cloudFiles.schemaLocation", var_schema_location)
    .load(var_data_location) #Your dbfs location 
    .writeStream
    .option("checkpointLocation", var_checkpoint_location)
    .option("mergeSchema", "true")
    .trigger(once=True)
    .table(raw_logs_table_name)
    .awaitTermination())
    
    bronzeDF = spark.table(var_raw_log_table)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search