skip to Main Content

We plan to use AWS SQS service to queue events created from web service and then use several workers to process those events. One event can only be processed one time. According to AWS SQS document, AWS SQS standard queue can “occasionally” produce duplicated message but with unlimited throughput. AWS SQS FIFO queue will not produce duplicated message but with throughput limitation of 300 API calls per second (with batchSize=10, equivalent of 3000 messages per second). Our current peak hour traffic is only 80 messages per second. So, both are fine in terms of throughput requirement. But, when I started to use AWS SQS FIFO queue, I found that I need to do extra work like providing extra parameters
“MessageGroupId” and “MessageDeduplicationId” or need to enable “ContentBasedDeduplication” setting. So, I am not sure which one is a better solution. We just need the message not duplicated. We don’t need the message to be FIFO.

Solution #1:
Use AWS SQS FIFO queue. For each message, need to generate a UUID for “MessageGroupId” and “MessageDeduplicationId” parameters.

Solution #2:
Use AWS SQS FIFO queue with “ContentBasedDeduplcation” enabled. For each message, need to generate a UUID for “MessageGroupId”.

Solution #3:
Use AWS SQS standard queue with AWS ElasticCache (either Redis or Memcached). For each message, the “MessageId” field will be saved in the cache server and checked for duplication later on. Existence means this message has been processed. (By the way, how long should the “MessageId” exists in the cache server. AWS SQS document does not mention how far back a message could be duplicated.)

2

Answers


  1. You are making your systems complicated with SQS.

    We have moved to Kinesis Streams, It works flawlessly. Here are the benefits we have seen,

    1. Order of Events
    2. Trigger an Event when data appears in stream
    3. Deliver in Batches
    4. Leave the responsibility to handle errors to the receiver
    5. Go Back with time in case of issues
      Buggier Implementation of the process
    6. Higher performance than SQS

    Hope it helps.

    Login or Signup to reply.
    • My first question would be that why is it even so important that you don’t get duplicate messages? An ideal solution would be to use a standard queue and design your workers to be idempotent. For e.g., if the messages contain something like a task-ID and store the completed task’s result in a database, ignore those whose task-ID already exists in DB.
    • Don’t use receipt-handles for handling application-side deduplication, because those change every time a message is received. In other words, SQS doesn’t guarantee same receipt-handle for duplicate messages.
    • If you insist on de-duplication, then you have to use FIFO queue.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search