My S3 bucket contains a bunch of files in a multilevel folder structure. I’m trying to identify the top level folders in the hierarchy, but objects.all()
returns some but not all folders as distinct ObjectSummary
objects. Why?
Sample file structure:
file1.txt
a/file2.txt
a/a1/file3.txt
b/b1/file4.txt
Desired output: [a,b]
What I’m doing:
boto3.resource('s3').Bucket('mybucket').objects.all()
This returns the following ObjectSummary
objects:
file1.txt
a/
a/file2.txt
a/a1/file3.txt
b/b1/file4.txt
Notice that a/
is listed as a separate entry, but b/
is not, while the files in b/
are.
I could understand it returning neither, as folders are technically not distinct entities, or both, but why are some folders returned and others not?
I also understand there could be other ways to achieve my objective, but I want to understand why boto3 is behaving this way.
3
Answers
S3 does have the concept of creating a folder, through a Create Folder button, which creates a dedicated object with just the folder name, separate from the files that have this as a prefix.
a/
in the example above was a folder I created manually, but I hadn't done this forb/
.There are no folders in S3. A concept of a folder does not exist in object storage which is S3. What you call a "folder" is just a visual representation of an object with the key
a/
orb/
. Basically AWS console artificially calls everything with/
a folder leading to all this confusion.So
a/
is just an object (not folder) calleda/
. You don’t have/b
"folder", because there is no object called precisely/b
. Instead you have an object which is calledb/b1/file4.txt
(notb/
).To identify "top-level folders", you can use:
By specifying
Delimiter='/'
it returns a list ofCommonPrefixes
that are effectively the folder names.