skip to Main Content

I’m using an LambdaToSqsToLambda pattern

Now, the second Lambda (the one that polls messages from SQS), can fail, and I’m using the BatchItemFailures to report the messages that failed to SQS, and letting SQS re-insert those messages back in the Queue to be processed again.

Here’s the AWS CDK code:

    new LambdaToSqsToLambda(stack, `id1`, {
        existingProducerLambdaObj: lambdaA,
        existingConsumerLambdaObj: lambdaB,
        queueProps: {
            queueName: `Queue`,
            retentionPeriod: Duration.days(1),
            visibilityTimeout: Duration.hours(3),
            receiveMessageWaitTime: Duration.seconds(20),
            deliveryDelay: Duration.minutes(10),
        },
        sqsEventSourceProps: {
            maxConcurrency: 2,
            batchSize: 1,
            maxBatchingWindow: Duration.minutes(5),
            reportBatchItemFailures: true,
        },

The lambdaA will place some SQS messages in the Queue, and that’s fine.

The lambdaB, will then poll those messages from SQS, and "slowly" process them.

    async handler(event: SQSEvent) {
        const allRecords = this.sqs.toMessagesBodies(event)
        const batchItemFailures: SQSBatchItemFailure[] = [];

        for (let record of allRecords) {
            try {
                await this.process(record.body);
            } catch (e) {
                console.error('Error: ' + e);
                batchItemFailures.push({
                    itemIdentifier: record.recordId,
                });
            }
        }

        return {
            batchItemFailures: removeEmptyFromArray(batchItemFailures)
        }
    }

My specific need is to process the messages from SQS very slowly, ideally taking up to 1 hour to process the last message out of over 60 inserted messages. This is mainly to address RATE_LIMIT issues.

The problem I’m encountering is that SQS re-inserts the failed messages into the queue, but Lambda polls them immediately after, causing continuous and immediate processing. I’d like to implement a backoff retry approach, even if it’s a static delay (e.g., every 5 minutes poll this message from SQS).

Any help or suggestions would be greatly appreciated.

Thank you!

I’ve tried playing around with queueProps (visibilityTimeout, receiveMessageWaitTime, deliveryDelay and retentionPeriod), but nothing really seem helping.

Also tried changing increasing the maxBatchingWindow of sqsEventSourceProps, but could not see significant results.

2

Answers


  1. even if it’s a static delay (e.g., every 5 minutes poll this message from SQS).

    You can execute a lambda periodically with a Schdeduled Event, but I’m not sure you have to. It sounds like you want the messages to be delivered immediately the first time, but control how long before they are redelivered in some circumstances.

    I’ve tried playing around with queueProps (visibilityTimeout, receiveMessageWaitTime, deliveryDelay and retentionPeriod), but nothing really seem helping.

    visibilityTimeout on the queue would be what you wanted if you wanted every message to be retried after a given delay. deliveryDelay would be perfect if you wanted new messages to be delayed that long. But what you want is to change the visibility timeout of an individual message.

    The SQS To Lambda event includes the message receipt handle. You can us[e it to change the message visibility timeout which will expire before the message is re-queued.

    Login or Signup to reply.
  2. I’m assuming you’re using Standard queues.

    Lambda starts processing FIVE batches at a time with FIVE concurrent invocations of your function. If messages are still available, Lambda increases the number of processes that are reading batches by up to 300 more instances per minute. The maximum number of batches that an event source mapping can process simultaneously is 1,000.

    Lambda Scaling with SQS

    So Lambda is quite aggressive. Consequently, it’s a bit weird to try to only play with any of the knobs available to try to achieve the delay.

    OPTION 1:
    Set reserved concurrency on the function to throttle and avoid an acceptable level of concurrency. The downside is that messages may need a longer retention period otherwise they may expire and be discarded or lost forever even if you have a Dead Letter Queue set on the source queue. SQS does allow you to redrive messages to source queues though.

    OPTION 2:

    If you mainly require the delay with the hope that it would avoid the rate limits then you could call the UpdateEventSourceMapping API to disable the event source mapping when the rate limit keeps getting hit too often; flip the "Enabled" param as needed between true and false. You could then fine-tune a cron job that works with this specific function to trigger it so as to enable the SQS trigger after like a minute or so; again, the actual frequency would need to be fine-tuned. This way you could delay the action of Lambda polling the queue when you need to. You might want to ensure that you have sufficient regional concurrency and not too many function with async triggers in that region as that might delay the cron job actions.

    OPTION 3:

    I might be wrong here, but I think your use-case is just finding a way to make sure that message are retried if needed, but definitely processed eventually.

    So I think it’s worth exploring using SQS FIFO since you’re not concerned about speed per se. SQS FIFO with Lambda can scale in concurrency according to the number of active message groups. Might seem a bit hacky, but you may set a batch size of 1, maxbatch window and unique message groups for each message. Whatever you do please make sure to avoid losing messages by using the duplicate deduplication Ids and setting the max message retention period possible. Also monitor your Queue Depth and the Age of Oldest Message metrics. You can also make visibility timeout longer so that you can retry multiple time if the rate limit gets hit.

    OPTION 4:

    Forget the SQS trigger altogether and create your own manual poller using a Lambda Scheduler.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search