In Azure databricks, How to create a table based on json config dynamically?

SurenderRaja
July 26, 2023
304 views
0 votes
2 Answers

I have this tableConfig.json inside a ADLS container based location .

It has table specific details.

{
 "tableName":"employee",
 "databaseName": "dbo",
 "location" : "/mnt/clean/demo",
"colsList" : ["emp_id","emp_name","emp_city"]

}

Now I want to read that tableConfig.json in azure databricks python notebooks.
and dynamically create the create table and execute it, so that a delta table will be created

CREATE TABLE <databaseName.tableName>
(<colsList>)
USING delta
LOCATION <location>

Is there any approach we can do this ?

Basically i want a notebook to be created and if we execute it, then it should create the delta table as per json config

Answers

- notNull
- July 26, 2023 at 2:32 pm
- 0 votes
0
I don’t think there is an option to create table based on the config file yet, but we can iterate through the json(dict) and create the table.

Example:
```
abc = [{
 "tableName":"employee",
 "databaseName": "dbo",
 "location" : "/mnt/clean/demo",
"colsList" : ["emp_id","emp_name","emp_city"]
}]

for i in abc:
  print(f"""spark.sql("CREATE TABLE {i['databaseName']}.{i['tableName']} USING delta location '{i['location']}'")""")

#spark.sql("CREATE TABLE dbo.employee USING delta location '/mnt/clean/demo'")
```
Login or Signup to reply.

- JayashankarGS
- July 26, 2023 at 3:17 pm
- 0 votes
0
I agree with @notNull using spark.sql function to create table,
In addition to that, using dataframe you can follow below approach.

First, load the json data into dataframe and follow below steps.

If you having only these columns in list you create sql script to each record in dataframe and execute spark.sql function on them.

Below is your sample data, that I used.

create a function to make sql script and do udf register.
```
from pyspark.sql.functions import *
from pyspark.sql.types import *

def gen_sql_script(tbl_name,db_name,loc,col_list):
    script=f"""
    CREATE TABLE {db_name}.{tbl_name}
    ({col_list[0]} int,{col_list[1]} string,{col_list[2]} string)
    USING delta
    LOCATION '{loc}';
    """
    return script
    
get_sql = udf(gen_sql_script,StringType())
```
Next, add new column called script calling above function.
```
df = df.withColumn("script",get_sql("tableName","databaseName","location","colsList"))
display(df)
```
Next, take only script column and run spark.sql function.
```
for i in df.select("script").collect():
    print(i['script'])
    spark.sql(f"{i['script']}")
```
This creates delta table in the given location.

Output
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.