skip to Main Content

I have a Node.js (Express.js) server for my React.js website as BFF. I use Node.js for SSR, proxying some request and cache some pages in Redis. In last time I found that my server time to time went down. I suggest an uptime is about 2 days. After restart, all ok, then response time growth from hour to hour. I have resource monitoring at this server, and I see that server don’t have problems with RAM or CPU. It used about 30% of RAM and 20% of CPU.

I regret to say it’s a big production site and I can’t make minimal reproducible example, cause i don’t know where is reason of these error 🙁

Except are memory and CPU leaks, what will be reasons for Node.js server might go went down?

I need at least direction to search.

UPDATE1:

"went down" – its when kubernetes kills container due 3 failed life checks (GET request to a root / of website)

My site don’t use any BD connection but call lots of 3rd party API’s. About 6 API requests due one GET/ request from browser

UPDATE2:

Thx. To your answers, guys.
To understand what happend inside my GET/ request, i’m add open-telemetry into my server. In longtime and timeout GET/ requests i saw long API requests with very big tcp.connect and tls.connect.

enter image description here

I think it happens due lack of connections or something about that. I think Mostafa Nazari is right.
I create patch and apply them within the next couple of days, and then will say if problem gone

I solve problem.

It really was lack of connections. I add reusing node-fetch connection due keepAlive and a lot of cache for saving connections. And its works.

Thanks for all your answers. They all right, but most helpful thing was added open-telemetry to my server to understand what exactly happens inside request.

For other people with these problems, I’m strongly recommended as first step, add telemetry to your project.

https://opentelemetry.io/

PS: i can’t mark two replies as answer. Joe have most detailed and Mostafa Nazari most relevant to my problem. They both may be "best answers".

Tnx for help, guys.

4

Answers


  1. Chosen as BEST ANSWER

    I solve problem. It really was lack of connections. I add reusing node-fetch connection due keepAlive and a lot of cache for saving connections. And its works.

    Thanks for all your answers. They all right, but most helpful thing was added open-telemetry to my server to understand what exactly happens inside request.

    For other people with these problems, I'm strongly recommended as first step, add telemetry to your project.

    https://opentelemetry.io/


  2. Here are some of the many possibilities of why your server may go down:

    • Memory leaks The server may eventually fail if a Node.js application is leaking memory, as you stated in your post above. This may occur if the application keeps adding new objects to the memory without appropriately cleaning up.

    • Unhandled exceptions The server may crash if an exception is thrown in the application code and is not caught. To avoid this from happening, ensure that all exceptions are handled properly.

    • Third-party libraries If the application uses any third-party libraries, the server may experience problems as a result. Before using them, consider examining their resource usage, versions, or updates.

    • Network Connection The server’s network connection may have issues if the server is sending a lot of queries to third-party APIs or if the connection is unstable. Verify that the server is handling connections, timeouts, and retries appropriately.

    • Connection to the Database Even though your server doesn’t use any BD connections, it’s a good idea to look for any stale connections to databases that could be problematic.

    • High Volumes of Traffic The server may experience performance issues if it is receiving a lot of traffic. Make sure the server is set up appropriately to handle a lot of traffic, making use of load balancing, caching, and other speed enhancement methods. Cloudflare is always a good option 😉

    • Concurrent Requests Performance problems may arise if the server is managing a lot of concurrent requests. Check to see if the server is set up correctly to handle several requests at once, using tools like a connection pool, a thread pool, or other concurrency management strategies.

    (Credit goes to my System Analysis and Design course slides)

    Login or Signup to reply.
  3. Gradual growth of response time suggest some kind of leak.
    If CPU and memory consumption is excluded, another potentially limiting resources include:

    1. File descriptors – when your server forgets to close files. Monitor for number of files in /proc//fd/* to confirm this. See what those files are, find which code misbehaves.

    2. Directory listing – even temporary directory holding a lot of files will take some time to scan, and if your application is not removing some temporary files and lists them – you will be in trouble quickly.

    3. Zombie processes – just monitor total number of processes on the server.

    4. Firewall rules (some docker network magic may in theory cause this on host system) – monitor length of output of "iptables -L" or "iptables-save" or equivalent on modern kernels. Rare condition.

    5. Memory fragmentation – this may happen in languages with garbage collection, but often leaves traces with something like "Can not allocate memory" in logs. Rare condition, hard to fix. Export some health metrics and make your k8s restart your pod preemptively.

    6. Application bugs/implementation problems. This really depends on internal logic – what is going on inside the app. There may be some data structure that gets filled in with data as time goes by in some tricky way, becoming O(N) instead of O(1). Really hard to trace down, unless you have managed to reproduce the condition in lab/test environment.

    7. API calls from frontend shift to shorter, but more CPU-hungry ones. Monitor distribution of API call types over time.

    Login or Signup to reply.
  4. With any incoming/outgoing web requests, 2 File Descriptors will be acquired. as there is a limit on number of FDs, OS does not let new Socket to be opened, this situation cause "Timeout Error" on clients. you can easily check number of open FDs by sudo ls -la /proc/_PID_/fd/ | tail -n +4 | wc -l where _PID_ is nodejs PID, if this value is rising, you have connection leak issue.

    I guess you need to do the following to prevent Connection Leak:

    1. make sure you are closing outgoing API call Http Connection (it depends on how you are opening them, some libraries manage this and you just need to config them)

    2. cache your outgoing API call (if it is possible) to reduce API call

    3. for your outgoing API call, use Connection pool, this would manage number of open HttpConnection, reuse already-opened connection and …

    4. review your code, so that you can serve a request faster than now (for example make your API call more parallel instead of await or nested call). anything you do to make your response faster, is good for preventing this situation

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search