skip to Main Content

We are facing an issue with reading and writing streaming data into the target location.

we are working with some JSON telemetry data for tracking steps. New data files land in our delta lake every 5 seconds. Need a way that automatically ingests into delta lake.

2

Answers


  1. Hope this helps

    
        query = (spark.readStream
                      .format("cloudFiles")
                      .option("cloudFiles.format", "json")
                      .option("cloudFiles.schemaLocation", <schemaLocation>)
                      .load(<dataset_source>)
                      .writeStream
                      .format("delta")
                      .option("checkpointLocation", <checkpoint_path>)
                      .trigger(processingTime="<Provide the time>")
                      .outputMode("append") # you can use complete if needed
                      .table("table_name"))
    
    

    For more info refer: https://docs.databricks.com/ingestion/auto-loader/index.html

    Login or Signup to reply.
  2. if you want to read particular sub folder. For Example: This is my file location /mnt/2023/01/13 .I am want to read 2023/01 inside data, then load data like thisload('/mnt/<folder>/<sub_folder>') or /mnt/2023/*

    query = (spark.readStream
                      .format("cloudFiles")
                      .option("cloudFiles.format", "json")
                      .option("cloudFiles.schemaLocation", <Location>)
                      .load('/mnt/<folder>/<sub_folder>')
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search