FYI I am a complete Python novice. I have a for loop that is extracting some object info from an S3 bucket and populating it into a csv file. For every object for which the details are retrieved, I need that data to be populated to a csv. My issue is I am getting duplicate entries in the csv. What I am expecting in the csv is:
account_id;arn
key1;body1
key2;body2
key3;body3
.
.
. (until the loop runs through all objects in that folder).
But what I am getting is (below).
account_id;arn
key1;body1
account_id;arn
key1;body1
account_id;arn
key2;body2
account_id;arn
key1;body1
account_id;arn
key2;body2
account_id;arn
key3;body3
Also every time i run the script, it keeps adding the old data which is kind of multiplying the problem.
My current piece of code is:
for objects in my_bucket.objects.filter(Prefix="folderpath"):
key = objects.key
body = objects.get()['Body'].read()
field = ["account_id","arn"]
data = [
[key, body]
]
with open("my_file.csv", "a") as f:
writer = csv.writer(f, delimiter=";", lineterminator="\n")
writer.writerow(field)
writer.writerows(data)
2
Answers
It’s much easier if you use the csv module in Python.
Start by defining your headers and preparing your csv file like so