Amazon web services - Update Athena Table using AWS Data Wrangler

tafteh
August 9, 2022
219 views
1 vote
2 Answers

I started using AWS Data Wrangler and Athena, to upload my data files onto S3, and being able to query them, respectively. My question is about the procedure to "safely" update the data in the table.

Here is what I did:

I used the AWS Data Wrangler,

wr.s3.to_parquet(
df=my_df,
path='s3://temp',
dataset=True,
partition_cols = ['date'],
concurrent_partitioning = True,
mode = append
)

to export my DataFrame to S3 -as parquet files.

I used AWS Glue to crawl my data files in S3, and create Table A.
Via Athena, I can query my data in Table A.

Now I would like to update my data in table A. My questions are:

(insert new rows):
If I follow what I did above, and change the mode to "overwrite", then all my data in S3 is removed (expected, not desirable!), and the new data files is written onto S3. Thus, the Glue catalogue crawl results in NEW data rows in Table A, while removing all my past data in Table A.
This is not my desire. Is there a safe way to do this?
(update rows):
What if I would like to UPDATE data in Table A. Using AWS Wrangler, How can I achieve this, without adding redundant rows in Table A?

I have tested (2), using mode=append and indeed I ended up getting redundant rows in my Table A.

I appreciate if you kindly share your thoughts

Thanks,

Answers

- sylas
- August 30, 2022 at 5:43 am
- 0 votes
0
I think you need use mode="overwrite_partitions"

Login or Signup to reply.

- user2224804
- January 13, 2023 at 3:12 am
- 0 votes
0
You can use the below API:
```
awswrangler.s3.merge_upsert_table
```
Ref: https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.s3.merge_upsert_table.html
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Amazon web services – Update Athena Table using AWS Data Wrangler

Answers