When I add a file to S3, run a query against Athena, Athena returns the expected result with the data from this file.
Now if I then delete that same file from S3 and run the same query, Athena still returns the same data even though the file is not in S3 anymore.
Is this the expected behaviour? I thought Athena calls out to S3 on every query, but I’m now starting to think there is some sort of caching going on?
Does anyone have any ideas? I can’t find any information online about this.
Thanks for the help in advance!
2
Answers
Thanks for the help guys.
I actually was looking at the wrong files in S3 and the files I thought were removed were still present. Once I deleted them from S3, the query against Athena returned the expected results immediately.
Thanks!
Athena (Hive)/Glue load partitions with a frequency. If you want to load latest result you need run
MSCK REPAIR TABLE table_name;
to refresh Athena caches.