skip to Main Content

I know that athena store every query results in the bucket and query data will just accumulate over time. I want to know whether retaining previous query results in S3 would make an impact to performance of my queries.

For background, I have AWS services(Glue and Lambda) that uses athena to return data and mostly my query results would change frequently. I noticed that there are 200GB worth of data in my S3 now. Currently, it has only archive configurations. I’m thinking of adding life cycle rule that will only retain worth 7 days or 30 days. Is query result really important to be in s3 if we are not really using it?

2

Answers


  1. These are two completely different things. Query results are stored in S3 results location and the Glue Crawler runs over the Source Files. There is NO performance impact of having history of Query results.

    Login or Signup to reply.
  2. Query results can be used for a limited amount of time by athena if you benefit of reuse query results feature, or caching in AWS data wrangler library. For the remaining scenarios there is no impact on performances.
    Query results older than few hours can be used just for auditing/debugging pourpouses.
    I definitely recommend to put a lifecycle rule to clean up objects older than x days, where x can be something like 3 or 7 days.
    Doing so you will reduce s3 storage cost.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search