skip to Main Content

I am faced with a situation that I am not quite sure how to solve. Basically my system receives data from a third-party source via API gateway, publishes this data to an SNS topic which triggers a lambda function. Based on the message parameters, the lambda function pushes the message to one of three different SQS queues. These queues trigger one of three lambda functions which perform one of three possible actions – create, update or delete items in that order in another third-party system through their API endpoints.
The usual flow would be to first create an entity on the destination system and then each subsequent action should be to update/delete this entity. The problem is, sometimes I receive data for the same entity from the source within milliseconds, thus my system is unable to create the entity on the destination due to the fact that their API requires at least 300-400ms to do so. So when my system tries to update the entity, it’s not existing yet, thus my system creates it. But since I have a create action in the process of executing, it creates a duplicate entry on my destination.

So my question is, what is the best practice to consolidate messages for the same entity that arrive within less than a second of each other?

My Thoughts so far:
I am thinking of using redis to consolidate messages that are for the same entity before pushing them to the SNS topic, but I was hoping there would be a more straight-forward approach as I don’t want to introduce another layer of logic.
Any help would be much appreciated. Thank you.

2

Answers


  1. The best option would be to use an Amazon SQS FIFO queue, with each message using a Message Group ID that is set to the unique ID of the item that is being created.

    In a FIFO queue, SQS will ensure that messages are processed in-order, and will only allow one message per Message Group ID to be received at a time. Thus, any subsequent messages for the same Message Group ID will wait until an existing message has been fully processed.

    If this is not acceptable, then AWS Lambda now supports batch windows of up to 5 minutes for functions with Amazon SQS as an event source:

    AWS Lambda now allows customers using Amazon Simple Queue Service (Amazon SQS) as an event source to define a wait period, called MaximumBatchingWindowInSeconds, to allow messages to accumulate in their SQS queue before invoking a Lambda function. In addition to Batch Size, this is a second option to send records in batches, to reduce the number of Lambda invokes. This option is ideal for workloads that are not time-sensitive, and can choose to wait to optimize cost.

    Previously, Lambda functions polling from an SQS queue would send messages in batches of up to 10 before invoking the function. Now, customers can also define a time window that Lambda should wait to poll messages from their SQS queue before invoking their function. Lambda will wait for up to 300 seconds to poll messages from the SQS queue. When a batch window is defined, Lambda will also allow customers to define a batch size of up to 10,000 messages.

    To get started, when creating a new Lambda function or updating an existing function with SQS as an event source, customers can set the MaximumBatchingWindowInSeconds field to any value between 0 and 300 seconds on the AWS Management Console, the AWS CLI, AWS SAM or AWS SDK for Lambda. This feature is available in all AWS Regions where AWS Lambda and Amazon SQS are available, and requires no additional charge to use.

    Login or Signup to reply.
  2. the lambda function pushes the message to one of three different SQS queues

    So when my system tries to update the entity, it’s not existing yet, thus my system creates it. But since I have a create action in the process of executing, it creates a duplicate entry on my destination

    By using multiple queue you created yourself a thread race and now you are trying to patch it.

    Based on the provided information and context – as already answered – a single fifo queue with context id could be more appropriate (do you really need 3 queues?)
    If latency is critical, then a streaming could be a solution as well.

    As you described your issue, I think you don’t need to combine the messages (indeed you could use Redis, AWS Kinesis Analytics, DynamoDB..), but rather not to create the issue at thecfirst place

    Options

    • having a single fifo queue
    • having an idempotent and thread-safe backend service able handling concurrent updates (transactions, atomic updates,..)

    As well if you can create "duplicate" entries, it means the unique indexes are not enforced. They exist exactly for that reason.

    You did not specify the backend service (RDBMS, DynamoDB, MongoDB, other?) each has an option to handle the problem somehow.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search