within a UDF I want to read a delta table into a dataframe, based on its content update the row of the actual dataframe, on which the UDF is applied, and then update the delta table. I would use the UDF within a structured streaming foreachbatch. How is this possible?
df_other = spark.read.format("delta").load(path)
@udf(StringType())
def my_udf(df_other: DataFrame) -> str:
...
# some things to do based on df_other's content.
...
df_new_rows = ...
df_new_rows.write.format("delta").mode("append").write(path)
...
return "wathever"
2
Answers
Use the following UDF to read and update the delta table:
It will read and update the dataframe successfully, as shown below:
Delta table:
Updated delta table:
Here is the complete code for your reference:
You can refer this to UDF within a structured streaming.