skip to Main Content

I was looking to discover all available prefixes at https://commondatastorage.googleapis.com/chromium-browser-snapshots, but I can’t seem to find documentation on listing AWS S3 prefixes. I found plenty of documentation on how to list all bucketitems given a prefix, but not how to list all prefixes.

I found documentation on an LS command, but I have no idea what the intended use is. I think it is for some kind of AWS management console. I am more looking for a RESTful API that I can implement as an XHR request in JavaScript or PowerShell.

I believe that storage API is the bucket for all Chromium builds ever built by the chromium developer team. According to the response from that URL, it uses the AWS S3 API.

Google’s cloud storage API also seems to hint at the fact that it uses AWS’s S3 API.

2

Answers


  1. When you list the contents of a bucket using ListObjects() and pass a Delimiter, S3 will return a list of CommonPrefixes:

    aws s3api list-objects --bucket MYBUCKET --delimiter '/' --query 'CommonPrefixes[]' --output text
    

    This will show a list of all Prefixes before the first /.

    If you want to list ALL prefixes in the bucket (without recursively going through each prefix and making the above call), the easiest method is to retrieve all objects keys and then just remove everything after the last slash in an object key.

    Something like:

    import boto3
    
    s3_resource = boto3.resource('s3')
    
    prefixes = set()
    
    for object in s3_resource.Bucket('j-stack-ver').objects.all():
        key = object.key
        if '/' in key:
            last_slash = key.rfind('/')
            prefixes.add(key[:last_slash + 1])
    
    print(sorted(prefixes))
    
    Login or Signup to reply.
  2. The core the of the REST API that S3 and S3 compatible systems use is documented by AWS.

    For the case of getting prefixes, you can call it with a call like:

    https://commondatastorage.googleapis.com/chromium-browser-snapshots?delimiter=/
    

    Which returns an XML document like:

    <ListBucketResult xmlns="http://doc.s3.amazonaws.com/2006-03-01">
        <Name>chromium-browser-snapshots</Name>
        <Prefix/>
        <Marker/>
        <NextMarker>Android/</NextMarker>
        <MaxKeys>1</MaxKeys>
        <Delimiter>/</Delimiter>
        <IsTruncated>true</IsTruncated>
        <Contents>
            <Key>index-new.html</Key>
            <Generation>1407426113710000</Generation>
            <MetaGeneration>1</MetaGeneration>
            <LastModified>2014-08-07T15:41:53.709Z</LastModified>
            <ETag>"e5ff54658e153a3305d563e99f3e18d8"</ETag>
            <Size>15137</Size>
        </Contents>
        <CommonPrefixes>
            <Prefix>Android/</Prefix>
        </CommonPrefixes>
    </ListBucketResult>
    

    There may be one more more CommonPrefixes/Prefix nodes with the common prefixes. And there may be one more Contents/Key items with the objects that do not share a common prefix. If the IsTruncated node is set to true, then you will need to perform another request with the contents of NextMarker as the marker value:

    https://commondatastorage.googleapis.com/chromium-browser-snapshots?delimiter=/&marker=Android/
    

    Note that you can’t simply blindly pass the value from NextMarker and use it in the URL, you will need to decode it’s value, either with a library, or manually parsing it out to ensure that you decode any XML entities, and then encode the value use standard URL encoding rules.

    Finally, this ignores the rules for performing signed requests. For open buckets like this one in the question, this doesn’t matter, but for most S3 buckets, you will need to follow the documentation to sign a request.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search