azure databricks autoloader with structure streaming

TonyScott
January 13, 2023
305 views
0 votes
2 Answers

We are facing an issue with reading and writing streaming data into the target location.

we are working with some JSON telemetry data for tracking steps. New data files land in our delta lake every 5 seconds. Need a way that automatically ingests into delta lake.

Tags: azure databricks

Answers

Hope this helps


    query = (spark.readStream
                  .format("cloudFiles")
                  .option("cloudFiles.format", "json")
                  .option("cloudFiles.schemaLocation", <schemaLocation>)
                  .load(<dataset_source>)
                  .writeStream
                  .format("delta")
                  .option("checkpointLocation", <checkpoint_path>)
                  .trigger(processingTime="<Provide the time>")
                  .outputMode("append") # you can use complete if needed
                  .table("table_name"))

For more info refer: https://docs.databricks.com/ingestion/auto-loader/index.html

- SaiVamsi
- January 13, 2023 at 9:27 am
- 0 votes
0
if you want to read particular sub folder. For Example: This is my file location /mnt/2023/01/13 .I am want to read 2023/01 inside data, then load data like thisload('/mnt/<folder>/<sub_folder>') or /mnt/2023/*
```
query = (spark.readStream
                  .format("cloudFiles")
                  .option("cloudFiles.format", "json")
                  .option("cloudFiles.schemaLocation", <Location>)
                  .load('/mnt/<folder>/<sub_folder>')
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.