skip to Main Content

My S3 bucket contains a bunch of files in a multilevel folder structure. I’m trying to identify the top level folders in the hierarchy, but objects.all() returns some but not all folders as distinct ObjectSummary objects. Why?

Sample file structure:

file1.txt
a/file2.txt
a/a1/file3.txt
b/b1/file4.txt

Desired output: [a,b]

What I’m doing:

boto3.resource('s3').Bucket('mybucket').objects.all()

This returns the following ObjectSummary objects:

file1.txt
a/
a/file2.txt
a/a1/file3.txt
b/b1/file4.txt

Notice that a/ is listed as a separate entry, but b/ is not, while the files in b/ are.

I could understand it returning neither, as folders are technically not distinct entities, or both, but why are some folders returned and others not?

I also understand there could be other ways to achieve my objective, but I want to understand why boto3 is behaving this way.

3

Answers


  1. Chosen as BEST ANSWER

    S3 does have the concept of creating a folder, through a Create Folder button, which creates a dedicated object with just the folder name, separate from the files that have this as a prefix.

    a/ in the example above was a folder I created manually, but I hadn't done this for b/.


  2. I could understand it returning neither, as folders are technically not distinct entities, or both, but why are some folders returned and others not?

    There are no folders in S3. A concept of a folder does not exist in object storage which is S3. What you call a "folder" is just a visual representation of an object with the key a/ or b/. Basically AWS console artificially calls everything with / a folder leading to all this confusion.

    So a/ is just an object (not folder) called a/. You don’t have /b "folder", because there is no object called precisely /b. Instead you have an object which is called b/b1/file4.txt (not b/).

    Login or Signup to reply.
  3. To identify "top-level folders", you can use:

    import boto3
    
    s3_client = boto3.client('s3')
    
    response = s3_client.list_objects_v2(Bucket='BUCKET-NAME',Delimiter='/')
    prefix_list = [dict['Prefix'] for dict in response['CommonPrefixes']]
    print(prefix_list)
    

    By specifying Delimiter='/' it returns a list of CommonPrefixes that are effectively the folder names.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search