skip to Main Content

I have this tableConfig.json inside a ADLS container based location .

It has table specific details.

{
 "tableName":"employee",
 "databaseName": "dbo",
 "location" : "/mnt/clean/demo",
"colsList" : ["emp_id","emp_name","emp_city"]

}

Now I want to read that tableConfig.json in azure databricks python notebooks.
and dynamically create the create table and execute it, so that a delta table will be created

CREATE TABLE <databaseName.tableName>
(<colsList>)
USING delta
LOCATION <location> 

Is there any approach we can do this ?

Basically i want a notebook to be created and if we execute it, then it should create the delta table as per json config

2

Answers


  1. I don’t think there is an option to create table based on the config file yet, but we can iterate through the json(dict) and create the table.

    Example:

    abc = [{
     "tableName":"employee",
     "databaseName": "dbo",
     "location" : "/mnt/clean/demo",
    "colsList" : ["emp_id","emp_name","emp_city"]
    }]
    
    for i in abc:
      print(f"""spark.sql("CREATE TABLE {i['databaseName']}.{i['tableName']} USING delta location '{i['location']}'")""")
    
    #spark.sql("CREATE TABLE dbo.employee USING delta location '/mnt/clean/demo'")
    
    Login or Signup to reply.
  2. I agree with @notNull using spark.sql function to create table,
    In addition to that, using dataframe you can follow below approach.

    First, load the json data into dataframe and follow below steps.

    If you having only these columns in list you create sql script to each record in dataframe and execute spark.sql function on them.

    Below is your sample data, that I used.

    enter image description here

    create a function to make sql script and do udf register.

    from pyspark.sql.functions import *
    from pyspark.sql.types import *
    
    def gen_sql_script(tbl_name,db_name,loc,col_list):
        script=f"""
        CREATE TABLE {db_name}.{tbl_name}
        ({col_list[0]} int,{col_list[1]} string,{col_list[2]} string)
        USING delta
        LOCATION '{loc}';
        """
        return script
        
    get_sql = udf(gen_sql_script,StringType())
    

    Next, add new column called script calling above function.

    df = df.withColumn("script",get_sql("tableName","databaseName","location","colsList"))
    display(df)
    

    enter image description here

    Next, take only script column and run spark.sql function.

    for i in df.select("script").collect():
        print(i['script'])
        spark.sql(f"{i['script']}")
    

    enter image description here

    This creates delta table in the given location.

    Output

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search