Is there a way to input Excel files with multiple sheets as"csv" to input into PostgreSQL?

a_benji
January 22, 2024
176 views
0 votes
2 Answers

I am tasked with storing my company’s data in a database and I am using PostgreSQL. I am using python to access and interact with PostgreSQL. My issues so far is that I have multiple excel datasets that are quite complex and large with multiple sheets (i.e. up to 20 sheets per excel file). These are each in "xlsx" format and I am trying to get them in "csv" to input into tables in the database. I tried using pandas to convert to csv and it converted each excel file to csv format. However, there were many NaN values, an unnamed column added and the output csv file’s columns did not correlate to its associated rows of data. It became very messy and incorrectly organized.

Any ideas on how I can: Convert from "xlsx" to "csv" with excel files that have multiple sheets?

Thanks!

I tried using pandas to convert the excel "xlsx" files into "csv" files. I got outputs of csv files, however, upon opening it I saw an unnamed column added, NaN’s, and columns not corresponding to the rightly associated rows of data. In summary, the output csv file was very messy and not a correct depiction of the information contained in the original excel spreadsheets.

Answers

You use the below code pattern to load your data in excel sheet to postgres database

import pandas as pd
from sqlalchemy import create_engine

# Replace these with your PostgreSQL connection details
username = 'your_username'
password = 'your_password'
host = 'localhost'
port = '5432'
database = 'your_database'

# Path to your Excel file
excel_file = 'path_to_your_excel_file.xlsx'

# Create a SQLAlchemy engine
engine = create_engine(f'postgresql+psycopg2://{username}:{password}@{host}:{port}/{database}')

# Read the Excel file
xls = pd.ExcelFile(excel_file)

# Loop through each sheet
for sheet_name in xls.sheet_names:
# Read each sheet to a DataFrame
    df = pd.read_excel(xls, sheet_name)

    # Load DataFrame into PostgreSQL table, replace 'your_table_name' with your actual table name
    # The name of the table will be the same as the sheet name
    df.to_sql(sheet_name, engine, if_exists='replace', index=False)

print("Data loaded successfully.")

import pandas as pd

def convert_excel_to_csv(excel_file):
    xls = pd.ExcelFile(excel_file)
    for sheet_name in xls.sheet_names:
        df = pd.read_excel(xls, sheet_name)
        df = df.dropna(how='all')  # Handling NaN by dropping all row filled with NaN and if not all filled then replacing with empty string as per ask and removed Unnamed column
        df = df.fillna('')  
        df = df.loc[:, ~df.columns.str.contains('^Unnamed')]  
        csv_file = f"{excel_file}_{sheet_name}.csv"
        df.to_csv(csv_file, index=False)
        print(f"Created CSV for sheet: {sheet_name}")
path_to
convert_excel_to_csv('path_to_exccel.xlsx')

this will create files named excel_file_path_sheet_name

Please signup or login to give your own answer.

Click here to cancel reply.