Benefit of Kinesis with Lambda transformation instead of S3 triggering Lambda for transformation - Amazon web services

user2059084
October 1, 2022
263 views
2 votes
3 Answers

If I have KBs of data every few seconds, is using Kinesis Firehose with a Lambda function to perform a transformation and using Redshift as the target necessarily better than just doing the same except with S3 instead of Kinesis? I know Kinesis is intended for real-time processing but is there actually a benefit to using it rather than just using S3 and having files dropped into S3 trigger a lambda function for processing and storing into Redshift? They seem equivalent other than that Kinesis is associated with real-time processing while S3 is not.

Answers

- JohnRotenstein
- October 1, 2022 at 4:25 am
- 0 votes
0
Amazon Kinesis Data Firehose can combine streams of data into fewer, larger objects in Amazon S3 based on size or time. This makes it easier to store in S3 and load into Redshift.

Amazon Redshift performs poorly if you are continually using INSERT on a few rows of data, compared to using COPY on a larger set of data (which allows parallel loading too).

Amazon Kinesis Firehose will take care of the full process of receiving data through to inserting it into Redshift. If you want to do it yourself, you’ll presumably trigger an AWS Lambda function for each object and you’ll need to write code to insert it into Redshift and handle errors. It’s really a matter of balancing cost against convenience.

Login or Signup to reply.

- BillWeiner
- October 1, 2022 at 4:35 am
- 0 votes
0
The biggest thing Kinesis Firehose is the Firehose functionality. This part bundles up the Kinesis events into S3 files of reasonably size. Loading many small files into Redshift can be very inefficient. So your Lambda process will also need to bundle into Redshift loadable files significant size.

Login or Signup to reply.

- Sudarshankumar
- October 3, 2022 at 9:58 am
- 0 votes
0
Good question .

If you dont use Firehouse the real issue will be cost and performance.

Cost

S3 has less cost for storage but it also has cost for put request .
If have many small files then storage cost will be less but you will have more put request cost .
So you would not want to make big files and put into S3 .In that case you will have to accumulate many files and then put into s3 .Firehose does it for for you .
Otherwise you need to write some thing and run somewhere .

Performance

Almost all data lake has better performance when they have less frequent inserts.
Like wise Redshift .
so you would again want to combine all small files and create one big file and then load /insert into Redshift .
Forehouse does this again for you .

If accumulate small files create bigger files .

Login or Signup to reply.