Save dataframe to same excel workbook but different sheets - Amazon Web Sevices

Rick
February 10, 2023
220 views
0 votes
2 Answers

I want to loop through my data and save every component based data in separate sheet in the same excel workbook in S3 bucket.

Dataframe df looks as below:

Below is my code:

    today = datetime.datetime.now().strftime('%m_%d_%Y_%H_%M_%S')
    components=["COMP1","COMP2","COMP3"]
    filename = 'auto_export_'+today
    
    for comp in components:
        df1= df[df['component']==comp]
        print(comp)
        print(df1)
        with io.BytesIO() as output:
            with pd.ExcelWriter(output, engine='xlsxwriter') as writer:
                df1.append_df_to_excel(writer, sheet_name=comp, index=False)
            data = output.getvalue()
        s3 = boto3.resource('s3')
        s3.Bucket('mybucket').put_object(Key='folder1/'+filename+'.xlsx', Body=data)

This is generating the excel file correctly but writing only COMP3 data into it. It is not writing COMP1 and COMP2 sheets. Any guidance on how to fix this problem?

Answers

You are writing each component/dataframe as an excel file to S3, overwriting the previous file. You’re doing everything inside the for loop, so you are writing out component 1, then completely obliterating all that work by overwriting it with component 2, then doing the same thing again by overwriting that with component 3.

You need to refactor your code like so:

today = datetime.datetime.now().strftime('%m_%d_%Y_%H_%M_%S')
components=["COMP1","COMP2","COMP3"]
filename = 'auto_export_'+today


# First create an output
with io.BytesIO() as output:

    # Now create the Excel file using the output
    with pd.ExcelWriter(output, engine='xlsxwriter') as writer:

        # Iterate over each component, adding each as a sheet to the Excel file
        for comp in components:
            df1= df[df['component']==comp]
            print(comp)
            print(df1)
            
            # Add this component to the Excel file as a new sheet
            df1.append_df_to_excel(writer, sheet_name=comp, index=False)
            
        # After we have finished iterating over all components, get the entire Excel file   
        data = output.getvalue()
        
        # Copy this Excel file to S3
        s3 = boto3.resource('s3')
        s3.Bucket('mybucket').put_object(Key='folder1/'+filename+'.xlsx', Body=data)

Using a list comp and .groupby:

from datetime import datetime
from io import BytesIO

import boto3
import pandas as pd


today = datetime.now().strftime("%m_%d_%Y_%H_%M_%S")
filename = f"auto_export_{today}"

with BytesIO() as output:
    with pd.ExcelWriter(output) as writer:
        for df in [x.reset_index(drop=True) for _, x in df.groupby("Component")]:
            sheet_name = df["Component"][0]
            df.to_excel(writer, sheet_name, index=False)

        data = output.getvalue()

        s3 = boto3.resource("s3")
        s3.Bucket("mybucket").put_object(Key=f"folder1/{filename}.xlsx", Body=data)

Please signup or login to give your own answer.

Click here to cancel reply.

Save dataframe to same excel workbook but different sheets – Amazon Web Sevices

Answers