Amazon web services - Python CSV: Append data from S3, duplicate entries

ShraySharan
March 18, 2024
244 views
1 vote
2 Answers

FYI I am a complete Python novice. I have a for loop that is extracting some object info from an S3 bucket and populating it into a csv file. For every object for which the details are retrieved, I need that data to be populated to a csv. My issue is I am getting duplicate entries in the csv. What I am expecting in the csv is:

account_id;arn

key1;body1

key2;body2

key3;body3
.
.
. (until the loop runs through all objects in that folder).

But what I am getting is (below).

account_id;arn

key1;body1

account_id;arn

key1;body1

account_id;arn

key2;body2

account_id;arn

key1;body1

account_id;arn

key2;body2

account_id;arn

key3;body3

Also every time i run the script, it keeps adding the old data which is kind of multiplying the problem.

My current piece of code is:

for objects in my_bucket.objects.filter(Prefix="folderpath"):
    key = objects.key
    body = objects.get()['Body'].read()
    field = ["account_id","arn"]
    data = [
        [key, body]
    ]
    with open("my_file.csv", "a") as f:
    writer = csv.writer(f, delimiter=";", lineterminator="\n")
    writer.writerow(field)
    writer.writerows(data)

Answers

import csv

# Assuming `my_bucket` and `folderpath` are defined earlier

# Open the CSV file in write mode
with open("my_file.csv", "w") as f:
    writer = csv.writer(f, delimiter=";", lineterminator="\n")

    # Write header row once at beginning of file
    writer.writerow(["account_id", "arn"])

    # Create a list to store content for all rows
    data = []

    # Iterate over objects in the S3 bucket
    for objects in my_bucket.objects.filter(Prefix="folderpath"):
        key = objects.key
        body = objects.get()["Body"].read()

        # Append the row
        data.append([key, body])

    # Write all the data at end in a single I/O operation
    writer.writerows(data)

It’s much easier if you use the csv module in Python.

Start by defining your headers and preparing your csv file like so

import csv

with open('names.csv', 'w', newline='') as csvfile:
    fieldnames = ['account_id', 'arn']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    
    for objects in my_bucket.objects.filter(Prefix="folderpath"):
        key = objects.key
        body = objects.get()['Body'].read()
        
        writer.writerow({'account_id': key, 'arn': body})

Please signup or login to give your own answer.

Click here to cancel reply.

Amazon web services – Python CSV: Append data from S3, duplicate entries

Answers