skip to Main Content

FYI I am a complete Python novice. I have a for loop that is extracting some object info from an S3 bucket and populating it into a csv file. For every object for which the details are retrieved, I need that data to be populated to a csv. My issue is I am getting duplicate entries in the csv. What I am expecting in the csv is:

account_id;arn

key1;body1

key2;body2

key3;body3
.
.
. (until the loop runs through all objects in that folder).

But what I am getting is (below).

account_id;arn

key1;body1

account_id;arn

key1;body1

account_id;arn

key2;body2

account_id;arn

key1;body1

account_id;arn

key2;body2

account_id;arn

key3;body3

Also every time i run the script, it keeps adding the old data which is kind of multiplying the problem.

My current piece of code is:

for objects in my_bucket.objects.filter(Prefix="folderpath"):
    key = objects.key
    body = objects.get()['Body'].read()
    field = ["account_id","arn"]
    data = [
        [key, body]
    ]
    with open("my_file.csv", "a") as f:
    writer = csv.writer(f, delimiter=";", lineterminator="\n")
    writer.writerow(field)
    writer.writerows(data)

2

Answers


  1. import csv
    
    # Assuming `my_bucket` and `folderpath` are defined earlier
    
    # Open the CSV file in write mode
    with open("my_file.csv", "w") as f:
        writer = csv.writer(f, delimiter=";", lineterminator="\n")
    
        # Write header row once at beginning of file
        writer.writerow(["account_id", "arn"])
    
        # Create a list to store content for all rows
        data = []
    
        # Iterate over objects in the S3 bucket
        for objects in my_bucket.objects.filter(Prefix="folderpath"):
            key = objects.key
            body = objects.get()["Body"].read()
    
            # Append the row
            data.append([key, body])
    
        # Write all the data at end in a single I/O operation
        writer.writerows(data)
    
    Login or Signup to reply.
  2. It’s much easier if you use the csv module in Python.

    Start by defining your headers and preparing your csv file like so

    import csv
    
    with open('names.csv', 'w', newline='') as csvfile:
        fieldnames = ['account_id', 'arn']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    
        writer.writeheader()
        
        for objects in my_bucket.objects.filter(Prefix="folderpath"):
            key = objects.key
            body = objects.get()['Body'].read()
            
            writer.writerow({'account_id': key, 'arn': body})
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search