Can Postgresql insert a list of dictionaries into a table when each dict has different keys?

CurtLH
May 30, 2023
245 views
3 votes
3 Answers

I have a list of Python dictionaries that look something like below where each dictionary can have slightly different keys.

data = [
  {'name': 'Bob', 'age': 32},
  {'name': 'Sara', 'city': 'Dallas'},
  {'name': 'John', 'age': 45, 'city': 'Atlanta'}
]

I also have a Postgres table that contains all of the possible keys that are seen within this list of dictionaries (e.g.: name, age, city).

I am looking for an elegant solution to efficiently insert this data into my database. While I could iterate over data line by line, and insert each line individually, that doesn’t scale so well to my actual dataset including millions of records.

I attempted to use the execute_values function from psycopg2, as seen in the example below, but that expected all of the dictionaries to have the same keys.

How can I edit my process below to insert multiple dictionaries at once, where each dictionary can contain different keys?

import psycopg2
from psycopg2.extras import execute_values

# connect to the database
conn = psycopg2.connect(
    host="localhost",
    database="db_name",
    user="psql_user",
    password="psql_password",
)
conn.autocommit = True
cur = conn.cursor()

# get the columns from first dictionary
columns = data[0].keys()

# write the SQL query to insert the records
query = """INSERT INTO schema.table 
            ({}) VALUES %s
            ON CONFLICT (name) DO NOTHING""".format(
        ",".join(columns)
)

# extract the values from each dictionary into as list of lists
values = [[value for value in line.values()] for line in data]

# execute the SQL query with the associated values
execute_values(cur, query, values)

Answers

Assuming you know the complete list of columns (which you should, if you are filling in a table), you can transform your data to fill in the missing values:

data = [
  {'name': 'Bob', 'age': 32},
  {'name': 'Sara', 'city': 'Dallas'},
  {'name': 'John', 'age': 45, 'city': 'Atlanta'}
]

import psycopg2
from psycopg2.extras import execute_values

# connect to the database
conn = psycopg2.connect(
    host="localhost",
    database="db_name",
    user="psql_user",
    password="psql_password",
)
conn.autocommit = True
cur = conn.cursor()

columns = ['name','age','city','state','zip']
empty = {k:None for k in columns}

# get the columns from first dictionary

# write the SQL query to insert the records
query = """INSERT INTO schema.table 
            ({}) VALUES ({})
            ON CONFLICT (name) DO NOTHING""".format(
        ",".join(columns),
        ','.join('%' for _ in columns)
)

# extract the values from each dictionary into as list of lists

values = []
for row in data:
    x = empty.copy()
    x.update( row )
    values.append( x )
print(query)
print(values)

# execute the SQL query with the associated values
#execute_values(cur, query, values)

Merge the dictionaries with a pattern (a dict that contains None for all the columns):

#...

data = [
  {'name': 'Bob', 'age': 32},
  {'name': 'Sara', 'city': 'Dallas'},
  {'name': 'John', 'age': 45, 'city': 'Atlanta'}
]

nulls = {'name': None, 'age': None, "city": None}

# get the columns from first dictionary
columns = (nulls | data[0]).keys()

# write the SQL query to insert the records
query = """INSERT INTO schema.table
            ({}) VALUES %s
            ON CONFLICT (name) DO NOTHING""".format(
        ",".join(columns)
)

# extract the values from each dictionary into as list of lists
values = [[value for value in (nulls | line).values()] for line in data]

# execute the SQL query with the associated values
execute_values(cur, query, values)

Using psycopg2 sql module.

create table books (id serial, bookcode integer, bookname text);

import psycopg2
from psycopg2 import sql

data = [{'bookcode': 1, 'bookname': 'test'}, {'bookcode': 2}]

con = psycopg2.connect("dbname=test host=localhost  user=postgres")
cur = con.cursor()

for d in data:
    col_names = list(d.keys())
    print(col_names)
    insert_qry = sql.SQL("insert into books ({})  values({})").format(sql.SQL(",").join(map(sql.Identifier, col_names)), 
                         sql.SQL(",").join(map(sql.Placeholder, col_names)))
    cur.execute(insert_qry, d)
con.commit()

select * from books;
 id | bookcode | bookname 
----+----------+----------
  1 |        1 | test
  2 |        2 | NULL

This of course assumes the columns can take NULL values. If that is not the case then you need to create some appropriate values for the missing fields.

Please signup or login to give your own answer.

Click here to cancel reply.