We’ve configured an MSK (kafka) event source as the trigger for our Lambda function. Even though the offset lag is increasing the lambda concurrency is limited to 4-5 almost all the time as can be seen in the graph below. The configuration used for the MSK event source is:
Batch Size: 50
Batch window: 30 seconds
Number of partitions in the Kafka topic: 10
I made sure that the load is distributed equally across all the partitions. Is there anything I’m missing here which is causing the concurrency issue? Any solution is appreciated. Thanks in advance.
2
Answers
I think you are hitting the same limitation we found some months ago, this link led us in the right way (aka workaround in our case):
AWS MSK lambda concurrent consumers
It honestly makes sense that the partitions are not being used in all their capability because the jump from the msk EC2 setup to the lambda runtime is not something trivial. Maybe you can try other connectors.
https://docs.confluent.io/kafka-connectors/aws-lambda/current/overview.html#multiple-tasks
It also makes sense that bridging through Kinesis you would not have these specific issues as it is all Amazon native stuff.
Ideally concurrency should be a number with no decimals match the count of consumers count.
source: https://docs.amazonaws.cn/en_us/lambda/latest/dg/with-kafka.html#services-kafka-scaling
The Offset Lag indicates performance issue for this blog gives better explanation Offset lag metric