skip to Main Content

I have an amazon S3 bucket with the following structure.

s3
|_ Year 2020 folder
|_ Year 2021 folder
|                 |_ Jan
|                 |_ Feb
|                      |_ filename_20210201.txt
|                      |_ filename_20210204.txt 
|_ Year 2023 folder
|                 |_ Jan
|                 |_ Feb
|                 |_ Mar
|                      |_ filename_20230301.txt  

Each of the year folders have sub folders for each month of the year. There are .txt files within the month folders as well. The year and month folders are added as they are needed.

How do I to get the latest filename from the latest folder using node.js.

2

Answers


  1. this would be my approach:

    just a function to check if a file is a directory:

    const isDirectory = source => fs.lstatSync(source).isDirectory();
    

    Here we order contents of a directory: first we get every file (and directory), map them so we have and array of objects with the file and creation time, then sort it.

    const oderFiles = dir => {
        return fs.readdirSync(dir)
            .map((file) => ({ file, mtime: fs.lstatSync(path.join(dir, file)).ctime }))
            .sort((a, b) => b.mtime.getTime() - a.mtime.getTime());
    }
    

    pass in a directory. this function will recursively order directory contents. if the first result (newest file) is a directory it calls itself with the given directory, if its a file it will return the path to the file.

    const findFile = dir => {
        const files = orderFiles(dir);
        if(files.length === 0) return undefined;
        const {file} = files[0]
        return isDirectory(file) ? findFile(path.join(dir,file)) : path.resolve(file);
    }
    
    Login or Signup to reply.
  2. Your goal is to "get the latest filename and increment the date by one day".

    The only true way to do this is to call ListObjects() and examine the LastModified date on each file.

    If your bucket contains a large number of objects, this can be slow since ListObjects() only returns 1000 objects at a time. You could reduce the number of scanned objects by providing a Prefix so that less objects are returned. For example, if you know that year = 2023, then you could pass Prefix='2023/'.

    An alternative approach would be:

    • Create a trigger on the S3 bucket that fires when a new object is created
    • Have the trigger call an AWS Lambda function that stores the Key of the object in AWS Systems Manager Parameter Store
    • Later, when you want to know the last Key that was used, you can query the Parameter Store rather than listing the S3 objects

    Or, if you control that stores the objects in S3, then that code could write to Parameter Store directly, treating it like a mini-database.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search