Where I’m At
I have a simple node.js kinesis data stream consumer that consumes iot data from various devices. Oftentimes, these are coming from more than 250 devices every 15 seconds, which means a high-volume of json is streaming into my consumer. I don’t do much processing of the json in the consumer except turning the epoch timestamp to a date-time format timestamp.
My Architecture for the system
aws iot core -> rules engine -> kinesis data streams -> firehose
-> s3 (and a nodejs consumer consuming data from kinesis data streams)
What I Want
The amount of data is huge I want a way to batch the data using redis or any other technology in backend before writing it down to mongodb, to avoid frequent writes to the db.
What I Hope To Learn
- Does this architecture make sense? If not, can you suggest a better alternative?
- What is the best way of using redis as a buffer for the iot data
- What is the best way of reading from stream and writing to redis buffer in order to perform the bulk inserts into mongodb
2
Answers
Your architecture makes sense for processing high-volume IoT data streams. Using AWS IoT Core to ingest data from multiple devices, and then processing it using AWS Kinesis Data Streams, Firehose, and finally storing it in Amazon S3, is a common and effective way to handle large amounts of data.
Using Redis as a buffer is a good idea to batch the data before writing it to MongoDB. Redis is an in-memory data structure store that can be used as a database, cache, or message broker. It is often used as a buffer between an application and a database to improve performance and reduce the number of writes to the database. Redis provides data structures such as lists, sets, and hashes, which can be used to store and manipulate data efficiently.
To use Redis as a buffer, you can write a node.js script that reads data from Kinesis Data Streams and writes it to Redis. You can use the Redis client for node.js to connect to Redis and perform operations such as adding data to lists or sets. You can also set a timer to flush the data in Redis to MongoDB periodically, such as every few minutes, to avoid frequent writes to MongoDB.
My first thought is that you could use Redis Streams. Add events as they arrive using XADD. Periodically read and trim using XREAD.