I have a list of Python dictionaries that look something like below where each dictionary can have slightly different keys.
data = [
{'name': 'Bob', 'age': 32},
{'name': 'Sara', 'city': 'Dallas'},
{'name': 'John', 'age': 45, 'city': 'Atlanta'}
]
I also have a Postgres table that contains all of the possible keys that are seen within this list of dictionaries (e.g.: name
, age
, city
).
I am looking for an elegant solution to efficiently insert this data into my database. While I could iterate over data
line by line, and insert each line individually, that doesn’t scale so well to my actual dataset including millions of records.
I attempted to use the execute_values
function from psycopg2
, as seen in the example below, but that expected all of the dictionaries to have the same keys.
How can I edit my process below to insert multiple dictionaries at once, where each dictionary can contain different keys?
import psycopg2
from psycopg2.extras import execute_values
# connect to the database
conn = psycopg2.connect(
host="localhost",
database="db_name",
user="psql_user",
password="psql_password",
)
conn.autocommit = True
cur = conn.cursor()
# get the columns from first dictionary
columns = data[0].keys()
# write the SQL query to insert the records
query = """INSERT INTO schema.table
({}) VALUES %s
ON CONFLICT (name) DO NOTHING""".format(
",".join(columns)
)
# extract the values from each dictionary into as list of lists
values = [[value for value in line.values()] for line in data]
# execute the SQL query with the associated values
execute_values(cur, query, values)
3
Answers
Assuming you know the complete list of columns (which you should, if you are filling in a table), you can transform your data to fill in the missing values:
Merge the dictionaries with a pattern (a dict that contains
None
for all the columns):Using
psycopg2
sql module.This of course assumes the columns can take
NULL
values. If that is not the case then you need to create some appropriate values for the missing fields.