I want to write a script for mass importing hundreds of csv files into PostgreSQL with differing columns/formatting
How do you bulk import data that has differing columns/formats into another table on postgresql using a csv file?
For example this code below outlines how to bulk import data into SQL, yet do not know how import data that has different columns.
So how would we modify the code below to accomplish this task? I am new to postgresql so please pardon my stupidity.
import pandas as pd
import psycopg2
from io import StringIO
# Database connection parameters
db_params = {
"dbname": "your_database",
"user": "your_username",
"password": "your_password",
"host": "your_host",
"port": "your_port",
}
# File path to your CSV
csv_file = "your_file.csv"
table_name = "your_table_name"
# Connect to PostgreSQL
conn = psycopg2.connect(**db_params)
cur = conn.cursor()
# Step 1: Read CSV into Pandas DataFrame
df = pd.read_csv(csv_file)
# Step 2: Create table dynamically
columns = df.columns
col_str = ", ".join([f'"{col}" TEXT' for col in columns]) # Assuming TEXT for simplicity
create_table_query = f"CREATE TABLE IF NOT EXISTS {table_name} ({col_str});"
cur.execute(create_table_query)
conn.commit()
# Step 3: Use COPY to load the data
buffer = StringIO()
df.to_csv(buffer, index=False, header=False) # Write DataFrame to buffer without header
buffer.seek(0)
# COPY data into the table
copy_query = f"COPY {table_name} ({', '.join(columns)}) FROM STDIN WITH CSV"
cur.copy_expert(copy_query, buffer)
conn.commit()
# Close connections
cur.close()
conn.close()
print(f"Data from {csv_file} has been successfully imported into {table_name}.")
I tried to aggregate all column names and was able to create a file with all column names yet was unsure of how to align the data with the columns.
2
Answers
Okay so here is what I did and it worked!
The idea is not bad, but if I understand correctly, you copied the "hundreds of CSV" into one file? This is a bad idea! You have to solve it in an outer iteral to go through the files and create tables with the same file name in a row. The number of columns in the CSV is determined by your program.
You should build the cycle in such a way that you enter a path where the files are and go through them and create the table for each file and upload the data.