skip to Main Content

I’m new the Azure Lakehouse and warehouse. I’m trying to grasp the concept in Microsoft Fabric workspace so that I can choose either lakehouse or warehouse to be my destination.

I understand that Lakehouse is combination of data lake and warehouse and hence supports both structured and unstructured data.

I also know that Lakehouse is based on spark engine, and warehouse is based on SQL engine.

This pyspark is used for transformations in Lakehouse, and SQL is used for transformations in warehouse.

I also know that Lakehouse supports using SQL queries (SELECT/INSERT/UPDATE/DELETE) via pyspark, and simple SELECT queries using %%sql.

I want to ask:

  1. Can I write data transformation SQL (INSERT/UPDATE/DELETE) using %%sql in Lakehouse?
  2. Can I write data transformation using python pyspark or notebooks in Warehouse?

2

Answers


  1. For your two questions:

    Can I write data transformation SQL (INSERT/UPDATE/DELETE) using %%sql
    in Lakehouse?

    Yes, you can do data transformation in Spark SQL beyond SELECT’s, for example MERGE, INSERT and DELETE as supported by the Spark SQL API, but these will have to be delta tables.

    Can I write data transformation using python pyspark or notebooks in
    Warehouse?

    Not directly, this is limited to the supported T-SQL features. You can fire up a note book in the Lakehouse, and connect to the Warehouse using a JDBC connection, and run T-SQL wrapped in python to run transformations, but there is no direct support for running a python command in the Warehouse experience.

    Login or Signup to reply.
    • In a notebook within the Lakehouse, you can execute SELECT queries using the %%sql magic command. However, data transformation queries such as UPDATE, and DELETE are currently not supported directly with %%sql in Lakehouse.

    • The Warehouse in Microsoft Fabric is based on the SQL engine, which is optimized for relational queries and transformations using T-SQL. So, you cannot directly use PySpark or Spark-based transformations in the Warehouse.

    I performed the update using pyspark the below as an example Using notebook:

    from pyspark.sql.functions import when
    df = spark.read.format("delta").load("/lakehouse/delta/d01")
    df_updated = df.withColumn("age", 
                    when(df.name == "Dilip", 32).otherwise(df.age))
    df_updated.show()
    df_updated.write.format("delta").mode("overwrite").save("/lakehouse/delta/d01")
    

    Results:

    
    +---+-------+---+
    | id|   name|age|
    +---+-------+---+
    |  3|Narayan| 40|
    |  1|  Dilip| 32|
    |  2|    Raj| 25|
    +---+-------+---+
    

    When I tried to perform update & delete Using the %%SQL

    %%sql
    UPDATE sample_table
    SET age = 32
    WHERE name = 'dilip';
    

    Error:

    UPDATE destination only supports Delta sources. Some(HiveTableRelation
    [spark_catalog.default.sample_table,
    org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols:
    [id#1336, name#1337, age#1338], Partition Cols: []] )

    %%sql
    DELETE FROM sample_table
    WHERE name = 'narayan';
    

    ERROR:

    [INTERNAL_ERROR] Unexpected table relation: HiveTableRelation
    [spark_catalog.default.sample_table,
    org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols:
    [id#1342, name#1343, age#1344], Partition Cols: []]

    I have performed the INSERT,DELETE using Pyspark since the table is in delta format

    INSERT Script:

    new_data = [Row(id=4, name="Dileep", age=28)]
    new_df = spark.createDataFrame(new_data)
    df_existing = spark.read.format("delta").load("/lakehouse/delta/d01")
    df_combined = df_existing.union(new_df)
    df_combined.write.format("delta").mode("overwrite").save("/lakehouse/delta/d01")
    

    DELETE Script:

    df = spark.read.format("delta").load("/lakehouse/delta/d01")
    df_filtered = df.filter(df.name != "Narayan")
    df_filtered.write.format("delta").mode("overwrite").save("/lakehouse/delta/d01")
    

    Final Results:

    +---+------+---+
    | id|  name|age|
    +---+------+---+
    |  4|Dileep| 28|
    |  1| Dilip| 32|
    |  2|   Raj| 25|
    +---+------+---+
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search