skip to Main Content

I have an async socket server written in C#, running on a Lightsail server running Amazon Linux. It consists of a TcpListener that accepts connections, starts up a new thread to listen when someone connects, initiates an SSL connection, and then acts as a server for an online game.

This server works fine for about a day, until suddenly all networking stops working on the server. The crash takes anywhere from 22 hours to one week to occur. The symptoms are as follows:

  1. Anyone already connected to the server will suddenly stop receiving/sending data. I can see in the logs that my inactivity checking code will eventually kick them for not sending heartbeat packets.
  2. The server will also be unable to connect to its MySQL database (which is running on the same system, so it’s unable to connect to localhost? I can still access it through PHPMyAdmin during this time).
  3. It is, however, still able to write both to files and to console, as my logger is still able to write to both.

The code looks like everyone else’s (I did try the changes suggested for this question, but it still crashed after ~24 hours). None of the errors get logged, so it looks like it never encounters an exception. No exceptions precede the crash, which is why I’ve been having problems figuring this one out.

For completeness, here is my main loop:

public void ListenLoop()
{
    TcpListener listener = new TcpListener(IPAddress.Any, 26000);
    listener.Start();

    while (true)
    {
        try
        {
            if (listener.Pending())
            {
                listener.BeginAcceptTcpClient(new AsyncCallback(AcceptConnection), listener);
                Logger.Write(Logger.Level.INFO, "continuing the main loop");
            }
            // Yield so we're not stuck in a busy-loop
            Thread.Sleep(5);
        }
        catch (Exception e)
        {
            Logger.Write(Logger.Level.ERROR, $"Error while waiting for listeners: {e.Message}n{e.StackTrace}");
        }
    }
}

and here are the accept parts:

/// <summary>
/// Finish an async callback but spawn a new thread to handle it if necessary
/// </summary>
/// <param name="ar"></param>
private void AcceptConnection(IAsyncResult ar)
{
    if (ar.CompletedSynchronously)
    {
        // Force the accept logic to run async, to keep our listening
        // thread free.
        Action accept = () => AcceptCallback(ar);
        accept.BeginInvoke(accept.EndInvoke, null);
    } else
    {
        AcceptCallback(ar);
    }
}

private void AcceptCallback(IAsyncResult ar)
{
    try
    {
        TcpListener listener = (TcpListener) ar.AsyncState;
        TcpClient client = listener.EndAcceptTcpClient(ar);
        // If the SSL connection takes longer than 5s we have a problem, and should stop
        client.Client.ReceiveTimeout = 5000;

        // Attempt to get the IP address of the client we're connecting to
        IPEndPoint ipep = (IPEndPoint)client.Client.RemoteEndPoint;
        string ip = ipep.Address.ToString();
        Logger.Write(Logger.Level.INFO, $"Connection begun to {ip}");

        // Authenticate and begin communicating with the client
        SslStream stream = new SslStream(client.GetStream(), false);
        try
        {
            stream.AuthenticateAsServer(
                serverCertificate,
                enabledSslProtocols: System.Security.Authentication.SslProtocols.Tls12,
                clientCertificateRequired: false,
                checkCertificateRevocation: true
                );

            stream.ReadTimeout = 3600000;
            stream.WriteTimeout = 3600000;

            NetworkPlayer player = new NetworkPlayer();
            player.Name = ip;
            player.Connection.Stream = stream;
            player.Connection.Connected = true;
            player.Connection.Client = client;
            stream.BeginRead(player.Connection.Buffer, 0, 1024, new AsyncCallback(ReadCallback), player);
        }
        catch (Exception e)
        {
            Logger.Write(Logger.Level.ERROR, $"Error while starting the connection to {ip}: {e.Message}");
            // The following code just calls stream.Close(); and client.Close(); but sends exceptions to my logger.
            CloseConnectionSafely(client, stream);
        }
    }
    catch (Exception e)
    {
        Logger.Write(Logger.Level.ERROR, $"Error while starting a connection to an unknown user: {e.Message}");
    }
}

2

Answers


  1. Chosen as BEST ANSWER

    The solution I found after consulting some people more familiar with C# than me is that I was running into Thread Pool Exhaustion. Essentially, I had a bunch of other async tasks (not shown in the code in the question, as they didn't look like they could cause what I was seeing) that were stuck executing some extremely-long-IOs (talking to users that had either disconnected improperly or were behind very high latency), which prevented the async AcceptCallback in my post from being picked up by the Thread Pool. This had a myriad of other side-effects which I outlined in the question:

    1. Creating a new connection to a MySQL database involves an async task behind-the-scenes, which was being starved out due to exhaustion.
    2. Completing the EndAcceptTcpClient required my async task to run, which requires an available thread.
    3. Tasks which did not involve the async keyword, such as Timer() bound tasks (like my logger I/O) were unaffected and could still run.

    My solution involved reducing the number of synchronization steps elsewhere in my program, and restructuring any tasks that could take a long time to execute so that they didn't block threads. Thank you to everyone who looked/commented.


  2. I’m guessing that your primary issue is that you are not disposing the stream and therefore you are getting socket exhaustion.

    Apart from that I would advise you to move to fully async code using Task.

    public async Task ListenLoop(CancellationToken cancel)  // use a cancellation token to shutdown the loop
    {
        using (var TcpListener listener = new TcpListener(IPAddress.Any, 26000))
        {
            listener.Start();
    
            while (!cancel.IsCancellationRequested)
            {
                try
                {
                    var client = await listener.AcceptTcpClientAsync(cancel);
                    Task.Run(async () => await AcceptConnection(client, cancel));
                    Logger.Write(Logger.Level.INFO, "continuing the main loop");
                    // no need to yield due to async
                }
                catch (OperationCanceledException) { }
                catch (Exception e)
                {
                    Logger.Write(Logger.Level.ERROR, $"Error while waiting for listeners: {e.Message}n{e.StackTrace}");
                }
            }
    
            listener.Stop();
        }
    }
    
    
    private async Task AcceptConnection(TcpClient client, CancellationToken cancel)
    {
        try
        {
            using (client)
            {
                // If the SSL connection takes longer than 5s we have a problem, and should stop
                client.Client.ReceiveTimeout = 5000;
                await AcceptConnectionImpl(client, cancel);
            }
        }
        catch (OperationCanceledException) { }
        catch (Exception e)
        {
            Logger.Write(Logger.Level.ERROR, $"Error while starting a connection to an unknown user: {e.Message}");
        }
    }
    
    private async Task AcceptConnectionImpl(TcpClient client, CancellationToken cancel)
    {
        // Attempt to get the IP address of the client we're connecting to
        IPEndPoint ipep = client.Client.RemoteEndPoint;
        Logger.Write(Logger.Level.INFO, $"Connection begun to {ipep.Address}");
    
        // Authenticate and begin communicating with the client
        using (SslStream stream = new SslStream(client.GetStream(), false))
        {
            try
            {
                await stream.AuthenticateAsServerAsync(
                    serverCertificate,
                    enabledSslProtocols: System.Security.Authentication.SslProtocols.Tls12,
                    clientCertificateRequired: false,
                    checkCertificateRevocation: true
                    );
    
                stream.ReadTimeout = 3600000;
                stream.WriteTimeout = 3600000;
    
                NetworkPlayer player = new NetworkPlayer();
                player.Name = ip;
                player.Connection.Stream = stream;
                player.Connection.Connected = true;
                player.Connection.Client = client;
                player.Cancellation = cancel;
                await player.YourReadLoopAsync();
            }
            catch (OperationCanceledException) { }
            catch (Exception e)
            {
                Logger.Write(Logger.Level.ERROR, $"Error while starting the connection to {ip}: {e.Message}");
                // The following code just calls stream.Close(); and client.Close(); but sends exceptions to my logger.
                CloseConnectionSafely(client, stream);
            }
        }
    }
    

    The function YourReadLoopAsync should read data from the stream using ReadAsync, or using classes like StreamReader which also has async functions.

    You don’t need to use CancellationToken, but it does make it easier to deal with shutting everything down cleanly. Make sure to catch OperationCanceledException on every try.

    See also this link for further tips.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search