skip to Main Content

I have a list of Python dictionaries that look something like below where each dictionary can have slightly different keys.

data = [
  {'name': 'Bob', 'age': 32},
  {'name': 'Sara', 'city': 'Dallas'},
  {'name': 'John', 'age': 45, 'city': 'Atlanta'}
]

I also have a Postgres table that contains all of the possible keys that are seen within this list of dictionaries (e.g.: name, age, city).

I am looking for an elegant solution to efficiently insert this data into my database. While I could iterate over data line by line, and insert each line individually, that doesn’t scale so well to my actual dataset including millions of records.

I attempted to use the execute_values function from psycopg2, as seen in the example below, but that expected all of the dictionaries to have the same keys.

How can I edit my process below to insert multiple dictionaries at once, where each dictionary can contain different keys?

import psycopg2
from psycopg2.extras import execute_values

# connect to the database
conn = psycopg2.connect(
    host="localhost",
    database="db_name",
    user="psql_user",
    password="psql_password",
)
conn.autocommit = True
cur = conn.cursor()

# get the columns from first dictionary
columns = data[0].keys()

# write the SQL query to insert the records
query = """INSERT INTO schema.table 
            ({}) VALUES %s
            ON CONFLICT (name) DO NOTHING""".format(
        ",".join(columns)
)

# extract the values from each dictionary into as list of lists
values = [[value for value in line.values()] for line in data]

# execute the SQL query with the associated values
execute_values(cur, query, values)

3

Answers


  1. Assuming you know the complete list of columns (which you should, if you are filling in a table), you can transform your data to fill in the missing values:

    data = [
      {'name': 'Bob', 'age': 32},
      {'name': 'Sara', 'city': 'Dallas'},
      {'name': 'John', 'age': 45, 'city': 'Atlanta'}
    ]
    
    import psycopg2
    from psycopg2.extras import execute_values
    
    # connect to the database
    conn = psycopg2.connect(
        host="localhost",
        database="db_name",
        user="psql_user",
        password="psql_password",
    )
    conn.autocommit = True
    cur = conn.cursor()
    
    columns = ['name','age','city','state','zip']
    empty = {k:None for k in columns}
    
    # get the columns from first dictionary
    
    # write the SQL query to insert the records
    query = """INSERT INTO schema.table 
                ({}) VALUES ({})
                ON CONFLICT (name) DO NOTHING""".format(
            ",".join(columns),
            ','.join('%' for _ in columns)
    )
    
    # extract the values from each dictionary into as list of lists
    
    values = []
    for row in data:
        x = empty.copy()
        x.update( row )
        values.append( x )
    print(query)
    print(values)
    
    # execute the SQL query with the associated values
    #execute_values(cur, query, values)
    
    Login or Signup to reply.
  2. Merge the dictionaries with a pattern (a dict that contains None for all the columns):

    #...
    
    data = [
      {'name': 'Bob', 'age': 32},
      {'name': 'Sara', 'city': 'Dallas'},
      {'name': 'John', 'age': 45, 'city': 'Atlanta'}
    ]
    
    nulls = {'name': None, 'age': None, "city": None}
    
    # get the columns from first dictionary
    columns = (nulls | data[0]).keys()
    
    # write the SQL query to insert the records
    query = """INSERT INTO schema.table
                ({}) VALUES %s
                ON CONFLICT (name) DO NOTHING""".format(
            ",".join(columns)
    )
    
    # extract the values from each dictionary into as list of lists
    values = [[value for value in (nulls | line).values()] for line in data]
    
    # execute the SQL query with the associated values
    execute_values(cur, query, values)
    
    Login or Signup to reply.
  3. Using psycopg2 sql module.

    create table books (id serial, bookcode integer, bookname text);
    
    import psycopg2
    from psycopg2 import sql
    
    data = [{'bookcode': 1, 'bookname': 'test'}, {'bookcode': 2}]
    
    con = psycopg2.connect("dbname=test host=localhost  user=postgres")
    cur = con.cursor()
    
    for d in data:
        col_names = list(d.keys())
        print(col_names)
        insert_qry = sql.SQL("insert into books ({})  values({})").format(sql.SQL(",").join(map(sql.Identifier, col_names)), 
                             sql.SQL(",").join(map(sql.Placeholder, col_names)))
        cur.execute(insert_qry, d)
    con.commit()
    
    select * from books;
     id | bookcode | bookname 
    ----+----------+----------
      1 |        1 | test
      2 |        2 | NULL
    
    

    This of course assumes the columns can take NULL values. If that is not the case then you need to create some appropriate values for the missing fields.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search