- Using execute 40 inserts per minute
- Using executemany 41 inserts per minute
- Using extras.execute_Values 42 inserts per minute
def save_return_to_postgres(record_to_insert) -> Any:
insert_query = """INSERT INTO pricing.xxxx (description,code,unit,price,created_date,updated_date)
VALUES %sreturning id"""
records = (record_to_insert[2],record_to_insert[1],record_to_insert[3],record_to_insert[4],record_to_insert[0],datetime.datetime.now())
# df = df[["description","code","unit","price","created_date","updated_date"]]
try:
conn = psycopg2.connect(database = 'xxxx',
user = 'xxxx',
password = 'xxxxx',
host= 'xxxx',
port='xxxx',
connect_timeout = 10)
print("Connection Opened with Postgres")
cursor = conn.cursor()
extras.execute_values(cursor, insert_query, [records])
conn.commit()
# print(record_to_insert)
finally:
if conn:
cursor.close()
conn.close()
print("Connection to postgres was successfully closed")
valores = df.values
for valor in valores:
save_return_to_postgres(valor)
print(valor)
2
Answers
I don’t know how much lines-per-insert postgres can take
But many SQL-based databases can take multiples inserts at the same time.
So instead of running
Try making several inserts at once in a single command
(Test it on pure SQL first to see if it works)
I had a similar issue and this link helped me
https://www.sqlservertutorial.net/sql-server-basics/sql-server-insert-multiple-rows/
(of course mine was not Postgres but the idea is the same,
decrease internet time by running multiple inserts in one command)
Tamo Junto
Use
execute_batch
orexecute_values
and use them over the entire record set. As of now you are not using the batch capabilities ofexecute_values
because you are inserting a single record at a time. You are further slowing things down by opening and closing a connection for each record as that is a time/resource expensive operation. Below is untested as I don’t have the actual data and am assuming whatdf.values
is.For more information see Fast execution helpers. Note that both the
execute_values
andexecute_batch
functions have apage_size
argument of default value 100. This is the batch size for the operations. For large data sets you can reduce the time further by increasing thepage_size
to make bigger batches and reduce the number of server round trips .