Let’s say I have a message producer, which sends a very high number of messages with versions V1, V2, …, VN, in a short timespan.
We have a service, which is not required to react to every such message. We can have a lazy approach, where we collect messages for a minute, and send forward the message with the latest version.
Wanted to understand whether there is a standard solve for this?
Note that I am working on AWS, so any service that they provide for this use case would also work out fine.
2
Answers
Instead of using a Queue (which requires messages to be specifically read), you could use a Stream.
From What Is Amazon Kinesis Data Streams?:
You could write some code that reads the stream for a given time period and determines which records have the highest version number.
To explain the different between a Queue and a Stream, think of somebody (the ‘Producer’) writing information on a piece of paper and throwing it into a jar. It keeps doing this, and the jar gets lots of pieces of paper. The jar is the Queue and the pieces of paper are the Messages. When somebody (the ‘Consumer’) wants to process a message, they reach into the jar and retrieve a random message. If you are using a First-In-First-Out (FIFO) queue, then replace the jar with a shoebox and always add the pieces of paper from one end and always retrieve them from the other end.
In contrast, a Stream is like a film strip where the film frames (Messages) are always kept in the correct order. Consumers can go back to an earlier ‘frame’ of film and start reading the stream. Multiple consumers can read the stream, each starting at the same or different frames of the the film. They can even go backwards and re-read some frames. This differs from queues where ‘consuming’ a message from the queue also removes it from the queue.
For your situation, you could send the messages to a Kinesis Data Stream instead of an Amazon SQS queue. Your app could then look at a segment of the stream, grab all the messages, figure out which ones have the highest version and then just forward those messages. It is a more complex program than one that merely pulls a message off a queue, but it gives you the benefit of being able to look at multiple messages simultaneously (and even repeatedly) to determine which one to forward.
If you only want to send the ‘very latest’ message, then it is even easier — just grab the last message from the stream and forward that one. No need to even look at earlier messages.
You can you SQS Batch processing and process these messages using lambda/ecs/ec2 and forward the message with highest version downstream.
You can configure lambda to read from queue for a certain period of time
here:
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html