S3 notifications generating multiple events and how to handle them - Amazon web services

MarekPuchalski
September 2, 2022
291 views
0 votes
2 Answers

There is this S3 notification feature described here:

Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.

and discussed here.

I thought I could mitigate the duplications a bit by deleting files I have already processed. The problem is, when a second event to the same file comes (a minute later) and I try to access the file, I don’t get an HTTP 404, I get an ugly AccessDenied:

[ERROR] ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 111, in lambda_handler
    raise e
  File "/var/task/lambda_function.py", line 104, in lambda_handler
    response = s3.get_object(Bucket=bucket, Key=key)
  File "/var/runtime/botocore/client.py", line 391, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/runtime/botocore/client.py", line 719, in _make_api_call
    raise error_class(parsed_response, operation_name)

which is unexpected and not acceptable.

I don’t want my lambda to suppress AccessDenied errors for obvious reasons. Is there an easy way to find out if the file has been already processed in the past or if notification service is playing tricks?

EDIT:

For those who think this is "an indication of some bug in my application" here the relevant piece of code:

    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')

    logger.info(f'Requesting file from bucket {bucket} with key {key}')

    try:
        response = s3.get_object(Bucket=bucket, Key=key)
    except ClientError as e:
        error_code = e.response["Error"]["Code"]
        if error_code == 'NoSuchKey':
            logger.info('Object does not exist any more')
            return
        else:
            raise e

It rather smells like an ugly issue on AWS side to me.

Answers

- lennart
- September 8, 2022 at 9:58 am
- 0 votes
0
You will need to inspect the error code by loading the object using the s3 resource Object to see whether it’s a 404. This way you can distinguish between a 404 and a 403 for instance and conclude whether the file has already been deleted in the meantime.
```
import boto3
import botocore

s3 = boto3.resource('s3')
try:
  s3.Object('my-bucket', 'myfile.txt').load()
except botocore.exceptions.ClientError as e:
  if e.response['Error']['Code'] == "404":
    print('The object does not exist.')
  else:
    print('Something else is wrong.')
    raise
```
EDIT:
Apologies I misread the question.
In that case I would just implement idempotency in the processor to make sure you only process each file once.

See for example:
- https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-idempotent/
- https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/idempotency/
Login or Signup to reply.

- jarmod
- September 8, 2022 at 1:15 pm
- 0 votes
0
On the duplicate delivery of notifications, yes this can happen as documented but is relatively rare:

Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.

One possible mechanism to deal with this is to build an idempotent workflow, for example that utilizes DynamoDB to record actions against an object at a given time that can be queried to prevent duplicate workflow on the same object. There are a number of idempotency features in the AWS Lambda PowerTools suite or third-party options that you might consider.

More discussion on the duplicate event topic can be found here.

On the AccessDenied error when attempting to download an absent object that you have GetObject permission for, this is actually a security feature designed to prevent the leakage of information. If you have ListBucket permission then you will get a 404 Not Found response indicating the absence of the object; if you don’t have ListBucket then you will get a 403 Forbidden response. To correct this, add s3:ListBucket on arn:aws:s3:::mybucket to your IAM policy.

More discussion on the AccessDenied topic can be found here.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

S3 notifications generating multiple events and how to handle them – Amazon web services

Answers