skip to Main Content

I want to loop through my data and save every component based data in separate sheet in the same excel workbook in S3 bucket.

Dataframe df looks as below:

enter image description here

Below is my code:

    today = datetime.datetime.now().strftime('%m_%d_%Y_%H_%M_%S')
    components=["COMP1","COMP2","COMP3"]
    filename = 'auto_export_'+today
    
    for comp in components:
        df1= df[df['component']==comp]
        print(comp)
        print(df1)
        with io.BytesIO() as output:
            with pd.ExcelWriter(output, engine='xlsxwriter') as writer:
                df1.append_df_to_excel(writer, sheet_name=comp, index=False)
            data = output.getvalue()
        s3 = boto3.resource('s3')
        s3.Bucket('mybucket').put_object(Key='folder1/'+filename+'.xlsx', Body=data)

This is generating the excel file correctly but writing only COMP3 data into it. It is not writing COMP1 and COMP2 sheets. Any guidance on how to fix this problem?

2

Answers


  1. You are writing each component/dataframe as an excel file to S3, overwriting the previous file. You’re doing everything inside the for loop, so you are writing out component 1, then completely obliterating all that work by overwriting it with component 2, then doing the same thing again by overwriting that with component 3.

    You need to refactor your code like so:

    today = datetime.datetime.now().strftime('%m_%d_%Y_%H_%M_%S')
    components=["COMP1","COMP2","COMP3"]
    filename = 'auto_export_'+today
    
    
    # First create an output
    with io.BytesIO() as output:
    
        # Now create the Excel file using the output
        with pd.ExcelWriter(output, engine='xlsxwriter') as writer:
    
            # Iterate over each component, adding each as a sheet to the Excel file
            for comp in components:
                df1= df[df['component']==comp]
                print(comp)
                print(df1)
                
                # Add this component to the Excel file as a new sheet
                df1.append_df_to_excel(writer, sheet_name=comp, index=False)
                
            # After we have finished iterating over all components, get the entire Excel file   
            data = output.getvalue()
            
            # Copy this Excel file to S3
            s3 = boto3.resource('s3')
            s3.Bucket('mybucket').put_object(Key='folder1/'+filename+'.xlsx', Body=data)
    
    Login or Signup to reply.
  2. Using a list comp and .groupby:

    from datetime import datetime
    from io import BytesIO
    
    import boto3
    import pandas as pd
    
    
    today = datetime.now().strftime("%m_%d_%Y_%H_%M_%S")
    filename = f"auto_export_{today}"
    
    with BytesIO() as output:
        with pd.ExcelWriter(output) as writer:
            for df in [x.reset_index(drop=True) for _, x in df.groupby("Component")]:
                sheet_name = df["Component"][0]
                df.to_excel(writer, sheet_name, index=False)
    
            data = output.getvalue()
    
            s3 = boto3.resource("s3")
            s3.Bucket("mybucket").put_object(Key=f"folder1/{filename}.xlsx", Body=data)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search