skip to Main Content

I have a DynamoDB table with the following structure:

sk             |  pk
---------------|-----
1#2023-12-01   |  abv
1#2023-12-02   |  abv
1#2023-12-03   |  abv
1#2023-12-04   |  abv
1#2023-12-05   |  abv
2#2023-12-01   |  abv
2#2023-12-02   |  abv
2#2023-12-03   |  abv
2#2023-12-04   |  abv
2#2023-12-05   |  abv
...
20#2023-12-11  |  abv
20#2023-12-12  |  abv
20#2023-12-12  |  abv

Now, I want to perform a batch delete operation on this table where pk = 'abv' and sk represents a dynamic integer between 1 and 30, followed by a literal # and then date part YYYY-MM-DD which have to less than the current date(assuming current date is 2023-12-12). Essentially, I want to remove all items where the date is less than 1-30#2023-12-12. So after the delete operation on the table, the final table should only contain items like:

sk             |  pk
---------------|-----
20#2023-12-12  |  abv
20#2023-12-12  |  abv

How can I achieve this in DynamoDB using a batch delete operation? Any guidance on constructing the batch delete request or any other optimized way to code it?. I am thinking this, but I am not a fan of the scan() operation of Dynamodb.

from datetime import datetime, timedelta
from typing import Dict, List
class Dynamodb:

    def batch_delete_old_data(self, pk: str):
        try:
            # Calculate the date to keep (e.g., today's date)
            date_to_keep = datetime.now().strftime('%Y-%m-%d')

            # Scan for all items with the specified pk
            response = self._table.scan(
                FilterExpression=Key('pk').eq(pk)
            )

            items_to_delete = [{'pk': item['pk'], 'sk': item['sk']} for item in response.get('Items', [])
                               if self.extract_date_part(item['sk']) < date_to_keep]

            with self._table.batch_writer() as batch:
                for item in items_to_delete:
                    batch.delete_item(Key=item)

            return {"message": "Old data cleanup successful"}

        except Exception as e:
            # Handle errors appropriately
            raise Exception(f"Error: {str(e)}")

    @staticmethod
    def extract_date_part(sk: str) -> str:
        # Extract the date part from the sk, assuming format "prefix#date"
        return sk.split('#')[-1] if '#' in sk else sk

2

Answers


  1. Chosen as BEST ANSWER

    As per the comment from Leeroy Hannigan, this should work for the conditional range query using the N number of Between and doing the batch delete later. I am posting this as an answer and I hope it will help others

    from boto3.dynamodb.conditions import Key from datetime import datetime, timedelta

    class Dynamodb:
    
        # ... (existing code)
    
        def batch_delete_old_data(self, pk_value: str):
            try:
                current_date_iso8601 = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')  
    
                start_date_for_between_filter = "2023-12-01"
                dynamic_integer_range = [str(i) for i in range(1, 32)]  # Assuming your range is from 1 to 31
    
                # Batch delete items
                with self._table.batch_writer() as batch:
                    for dynamic_integer in dynamic_integer_range:
                        # Construct a KeyConditionExpression for the query
                        key_condition_expression = (
                            Key('pk').eq(pk_value) &
                            Key('sk').between(f"{dynamic_integer}#{start_date_for_between_filter}", f"{dynamic_integer}#{current_date_iso8601}")
                        )
    
                        # Perform the query with the constructed KeyConditionExpression
                        response = self._table.query(
                            KeyConditionExpression=key_condition_expression
                        )
    
                        # Prepare items for batch delete
                        items_to_delete = [{'pk': item['pk'], 'sk': item['sk']} for item in response.get('Items', [])]
    
                        # Batch delete items
                        for item in items_to_delete:
                            batch.delete_item(Key=item)
    
                return {"message": "Old data cleanup successful"}
    
            except Exception as e:
                # Handle errors appropriately
                raise Exception(f"Error: {str(e)}")
    

  2. It’s not clear if you had an item with sort key 20#2023-12-02 would it also maintain to exist after the deletion? If so:
    You’ll need to execute N Query requests to obtain the items you needs.

    If not:
    What you are trying to achieve is a range delete, which is not supported in DynamoDB.

    To achieve your use-case you must first Query to retrieve all of the items where SK between X and Y. So instead of Scan use Query which will be more optimal .

    With the results, loop through a BatchWrite deleting 25 items per batch. Or as you use batch writer, you can pass the entire result set and it’ll chunk into 25 automatically.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search