skip to Main Content
  • Using execute 40 inserts per minute
  • Using executemany 41 inserts per minute
  • Using extras.execute_Values 42 inserts per minute
def save_return_to_postgres(record_to_insert) -> Any:


    insert_query = """INSERT INTO pricing.xxxx (description,code,unit,price,created_date,updated_date)
             VALUES %sreturning id"""

    records = (record_to_insert[2],record_to_insert[1],record_to_insert[3],record_to_insert[4],record_to_insert[0],datetime.datetime.now())

    # df = df[["description","code","unit","price","created_date","updated_date"]]

    
    try:
        
        conn = psycopg2.connect(database = 'xxxx',
                        user = 'xxxx',
                        password = 'xxxxx',
                        host= 'xxxx',
                        port='xxxx',
                        connect_timeout = 10)

        print("Connection Opened with Postgres")

        cursor = conn.cursor()
        extras.execute_values(cursor, insert_query, [records])
        
        conn.commit()

        # print(record_to_insert)

    finally:
        
        if conn:
            cursor.close()
            conn.close()
            print("Connection to postgres was successfully closed")

valores = df.values

for valor in valores:
    save_return_to_postgres(valor)
    print(valor)

2

Answers


  1. I don’t know how much lines-per-insert postgres can take

    But many SQL-based databases can take multiples inserts at the same time.

    So instead of running

    for insert_query in queries:
       sql_execute(insert_query)
    

    Try making several inserts at once in a single command
    (Test it on pure SQL first to see if it works)

    insert_list=[]
    for insert_query in queries:
        insert_list.append(insert_query)
    sql_execute(insert_list)
    

    I had a similar issue and this link helped me
    https://www.sqlservertutorial.net/sql-server-basics/sql-server-insert-multiple-rows/
    (of course mine was not Postgres but the idea is the same,
    decrease internet time by running multiple inserts in one command)

    Tamo Junto

    Login or Signup to reply.
  2. Use execute_batch or execute_values and use them over the entire record set. As of now you are not using the batch capabilities of execute_values because you are inserting a single record at a time. You are further slowing things down by opening and closing a connection for each record as that is a time/resource expensive operation. Below is untested as I don’t have the actual data and am assuming what df.values is.

    insert_query = """INSERT INTO pricing.xxxx (description,code,unit,price,created_date,updated_date)
                 VALUES %s returning id"""
    #execute_batch query
    #insert_query = """INSERT INTO pricing.xxxx #(description,code,unit,price,created_date,updated_date)
    #             VALUES (%s, %s, %s, %s, %s, %s) returning id"""
    
    valores = df.values
    #Create a list of lists to pass to query as a batch instead of singly.
    records = [[record_to_insert[2],record_to_insert[1],record_to_insert[3],
                record_to_insert[4],record_to_insert[0],datetime.datetime.now()] 
               for record_to_insert in valores]
    
    try:
            
            conn = psycopg2.connect(database = 'xxxx',
                            user = 'xxxx',
                            password = 'xxxxx',
                            host= 'xxxx',
                            port='xxxx',
                            connect_timeout = 10)
    
            print("Connection Opened with Postgres")
    
            cursor = conn.cursor()
            extras.execute_values(cursor, insert_query, [records])
            #execute_batch
            #extras.execute_batch(cursor, insert_query, [records])
    
            conn.commit()
    
            # print(record_to_insert)
    
        finally:
            
            if conn:
                cursor.close()
                conn.close()
                print("Connection to postgres was successfully closed")
    

    For more information see Fast execution helpers. Note that both the execute_values and execute_batch functions have a page_size argument of default value 100. This is the batch size for the operations. For large data sets you can reduce the time further by increasing the page_size to make bigger batches and reduce the number of server round trips .

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search