Hello I’m having a problem with lambda.
Our lambdas are generating images on demand. This done with konva.js and node-canvas (node-canvas is in a layer).
When ever we our lambda is under sustained load (calling the endpoint in a loop with await, the problem occurs no matter the concurrency) the memory usage keeps rising and rising for each invocation, until it’s eventually 100% and the lambda runtime is being killed. It’s like the runtime never garbage collects anything. We have tried increasing the memory all the way to 5GB, but the issue still occurs (although we can call it more times before it runs out).
Our setup consists a APIGW2 Http endpoint in front of the lambda. The lambda is placed in our vpc, in a private subnet with a nat gateway. Everything is deployed with CDK.
The function roughly does this:
- Parse the url
- If the image already exists in S3, we return that.
- Else
- Get the necessary data from our DB (MySQL aurora). The connection is a variable outside the handler.
- Download the necessary fonts from s3
- Generate the image. The image contains another image we give it in the URL. That we download.
- Upload the image to S3
- Return it as
Buffer.toString('base64')
Based on this (we don’t use sharp) https://github.com/lovell/sharp/issues/1710#issuecomment-494110353, we as mentioned above tried to increase the memory (from from 1gb -> 2 -> 3 -> 5gb). But it still has the same memory increase until it dies, and it starts all over.
Edit:
The function is written in typescript. The memory usage is measured in the Lambda Insights console. Where we can see that it gradually increases in percent after each invocation.
We only store the fonts in /tmp/fonts
and only if they do not exist (i.e. the disk usage doesn’t increase. We tested with the same fonts, so they are only downloaded on the first invocation)
The more memory the function has, the longer it takes before it hits 100% and crashes (for 5gb we can do ~170 invocations in a row, before it crashes and we can do another ~170).
There are no Lambda/S3 triggers. So it is not an infinite loop.
2
Answers
So I managed to narrow down the issue to the Konva library.
I found this issue: https://github.com/Automattic/node-canvas/issues/1974 and somebody commented with a link about node buffers. So I decided to test these parts.
First by purely allocating a buffer of the same size as what the image was:
This resulted in no issues.
Next I tried with pure Konva (we use a modified version), because I found this: https://github.com/konvajs/konva/issues/1247
Here the issue was back. Then I wanted to try using only node-canvas. Since the above issue mentioned leaks in that.
This didn't have the same issue. So Konva needed to be the culprit. I noticed in the above mentioned github issue, that they used
stage.destroy()
which I didn't do before. I simply added that and the issue seems to have gone away.I hope other in a similar situation can find this helpful. But thanks for all the suggestions!
The connection is a variable outside the handler.
this sentence makes sense for one of the problems. If you create a variable outside the handler, it becomes global and reused for the next executions. If you have same logic for other variables, you can move these into handler.You must shut down your DB connection when everything is over with DB.
Next, you can check your high memory consuming operations and be sure if its heap memory or stack memory.