skip to Main Content

I have an S3 bucket that contains multiple different files by date with different schemas (they are not in sub-folders). For example:

FileTypeA-032323
FileTypeA-032423
FileTypeA-032523
FileTypeB-032323
FileTypeB-032423
FileTypeC-032323
FileTypeD-032423
FileTypeD-032423
FileTypeD-032523

I need to merge all of the files for FileTypeA into a single table, then do the same for FileTypeB, etc. The intention is to eventually connect these tables to PowerBI. Is there a way to do this using Athena if the files are not in sub-folders, or is there an easy way to create sub-folders and copy files with the same naming convention into them? New daily files of each type will continue to arrive in the original s3 location. I’ve tried a lot of different approaches and keep ending up at dead ends.

I have tried DataBrew parameterized URL functionality to create datasets for each file type but kept getting errors regarding specific files when trying to load the project. I have also tried using AWS glue but didn’t see a way to union more than two files at a time.

2

Answers


  1. I would recommend creation of an AWS Lambda function with an Amazon S3 trigger:

    • When a new object is created, S3 invokes the Lambda function
    • The Lambda function examines the Key (filename) of the new object and determines where it should be moved
    • The function calls CopyObject() to copy the object to the desired location and then calls DeleteObject to remove the original file (see Moving file based on filename with Amazon S3)

    In Amazon Athena, you can define an External Table that points to the appropriate path in the folder (eg TypeA/) and any queries will automatically run against all objects in that path.

    Login or Signup to reply.
  2. You might be able to it with plain athena. You can use a special $path WHERE clause to query specific files.

    It is explained here how to do it. You can create a CTAS query with athena to create a new table by selecting the file specified in $path.

    Another option would be only add files to your partitions that you want to transform. Hence your query would only consider those.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search