I have a DynamoDB table with the following structure:
sk | pk
---------------|-----
1#2023-12-01 | abv
1#2023-12-02 | abv
1#2023-12-03 | abv
1#2023-12-04 | abv
1#2023-12-05 | abv
2#2023-12-01 | abv
2#2023-12-02 | abv
2#2023-12-03 | abv
2#2023-12-04 | abv
2#2023-12-05 | abv
...
20#2023-12-11 | abv
20#2023-12-12 | abv
20#2023-12-12 | abv
Now, I want to perform a batch delete operation on this table where pk = 'abv'
and sk
represents a dynamic integer between 1 and 30, followed by a literal #
and then date part YYYY-MM-DD
which have to less than the current date(assuming current date is 2023-12-12
). Essentially, I want to remove all items where the date is less than 1-30#2023-12-12
. So after the delete operation on the table, the final table should only contain items like:
sk | pk
---------------|-----
20#2023-12-12 | abv
20#2023-12-12 | abv
How can I achieve this in DynamoDB using a batch delete operation? Any guidance on constructing the batch delete request or any other optimized way to code it?. I am thinking this, but I am not a fan of the scan()
operation of Dynamodb.
from datetime import datetime, timedelta
from typing import Dict, List
class Dynamodb:
def batch_delete_old_data(self, pk: str):
try:
# Calculate the date to keep (e.g., today's date)
date_to_keep = datetime.now().strftime('%Y-%m-%d')
# Scan for all items with the specified pk
response = self._table.scan(
FilterExpression=Key('pk').eq(pk)
)
items_to_delete = [{'pk': item['pk'], 'sk': item['sk']} for item in response.get('Items', [])
if self.extract_date_part(item['sk']) < date_to_keep]
with self._table.batch_writer() as batch:
for item in items_to_delete:
batch.delete_item(Key=item)
return {"message": "Old data cleanup successful"}
except Exception as e:
# Handle errors appropriately
raise Exception(f"Error: {str(e)}")
@staticmethod
def extract_date_part(sk: str) -> str:
# Extract the date part from the sk, assuming format "prefix#date"
return sk.split('#')[-1] if '#' in sk else sk
2
Answers
As per the comment from Leeroy Hannigan, this should work for the conditional range query using the
N
number ofBetween
and doing thebatch
delete later. I am posting this as an answer and I hope it will help othersfrom boto3.dynamodb.conditions import Key from datetime import datetime, timedelta
It’s not clear if you had an item with sort key
20#2023-12-02
would it also maintain to exist after the deletion? If so:You’ll need to execute N Query requests to obtain the items you needs.
If not:
What you are trying to achieve is a range delete, which is not supported in DynamoDB.
To achieve your use-case you must first
Query
to retrieve all of the items where SKbetween
X and Y. So instead ofScan
useQuery
which will be more optimal .With the results, loop through a
BatchWrite
deleting 25 items per batch. Or as you use batch writer, you can pass the entire result set and it’ll chunk into 25 automatically.