skip to Main Content

I am tasked with storing my company’s data in a database and I am using PostgreSQL. I am using python to access and interact with PostgreSQL. My issues so far is that I have multiple excel datasets that are quite complex and large with multiple sheets (i.e. up to 20 sheets per excel file). These are each in "xlsx" format and I am trying to get them in "csv" to input into tables in the database. I tried using pandas to convert to csv and it converted each excel file to csv format. However, there were many NaN values, an unnamed column added and the output csv file’s columns did not correlate to its associated rows of data. It became very messy and incorrectly organized.

Any ideas on how I can: Convert from "xlsx" to "csv" with excel files that have multiple sheets?

Thanks!

I tried using pandas to convert the excel "xlsx" files into "csv" files. I got outputs of csv files, however, upon opening it I saw an unnamed column added, NaN’s, and columns not corresponding to the rightly associated rows of data. In summary, the output csv file was very messy and not a correct depiction of the information contained in the original excel spreadsheets.

2

Answers


  1. You use the below code pattern to load your data in excel sheet to postgres database

    import pandas as pd
    from sqlalchemy import create_engine
    
    # Replace these with your PostgreSQL connection details
    username = 'your_username'
    password = 'your_password'
    host = 'localhost'
    port = '5432'
    database = 'your_database'
    
    # Path to your Excel file
    excel_file = 'path_to_your_excel_file.xlsx'
    
    # Create a SQLAlchemy engine
    engine = create_engine(f'postgresql+psycopg2://{username}:{password}@{host}:{port}/{database}')
    
    # Read the Excel file
    xls = pd.ExcelFile(excel_file)
    
    # Loop through each sheet
    for sheet_name in xls.sheet_names:
    # Read each sheet to a DataFrame
        df = pd.read_excel(xls, sheet_name)
    
        # Load DataFrame into PostgreSQL table, replace 'your_table_name' with your actual table name
        # The name of the table will be the same as the sheet name
        df.to_sql(sheet_name, engine, if_exists='replace', index=False)
    
    print("Data loaded successfully.")
    
    Login or Signup to reply.
  2. import pandas as pd
    
    def convert_excel_to_csv(excel_file):
        xls = pd.ExcelFile(excel_file)
        for sheet_name in xls.sheet_names:
            df = pd.read_excel(xls, sheet_name)
            df = df.dropna(how='all')  # Handling NaN by dropping all row filled with NaN and if not all filled then replacing with empty string as per ask and removed Unnamed column
            df = df.fillna('')  
            df = df.loc[:, ~df.columns.str.contains('^Unnamed')]  
            csv_file = f"{excel_file}_{sheet_name}.csv"
            df.to_csv(csv_file, index=False)
            print(f"Created CSV for sheet: {sheet_name}")
    path_to
    convert_excel_to_csv('path_to_exccel.xlsx')
    

    this will create files named excel_file_path_sheet_name

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search