skip to Main Content

We have experienced a problem with AWS Aurora failover and looking for pointers as to how to resolve.

Scenario

AWS Aurora set up with two end points:

  • Writer:
    • host: stackName-dbcluster-ID.cluster-ID.us-west-2.rds.amazonaws.com
    • resolves to IP: 10.1.0.X
  • Reader:
    • host: stackName-dbcluster-ID.cluster-ro-ID.us-west-2.rds.amazonaws.com
    • resolves to IP: 10.1.0.Y

So therefore our PDO MySQL Connection string is stackName-dbcluster-ID.cluster-ID.us-west-2.rds.amazonaws.com (for writing)

After failover

On failover, the DNS entries are flipped to point as follows:

  • Reader:
    • host: stackName-dbcluster-ID.cluster-ro-ID.us-west-2.rds.amazonaws.com
    • resolves to IP: 10.1.0.X
  • Writer:
    • host: stackName-dbcluster-ID.cluster-ID.us-west-2.rds.amazonaws.com
    • resolves to IP: 10.1.0.Y

Critically, the PDO Connection string (for writing) remains the same “stackName-dbcluster-ID.cluster-ID.us-west-2.rds.amazonaws.com” b ut points to a different IP address.

What Happened

We had error 1290 “SQLSTATE[HY000]: General error: 1290 The MySQL server is running with the –read-only option so it cannot execute this statement”.

As the DB engines are stopped started, our initial persistent connections will have “gone away” and been invalidated (something we immediately handle in a reconnect/retry code).

However the error above means new connections will have been made to the old node, but then not further invalidated with propagation of the DNS change. They lasted 10/15 minutes (well beyond TTL of the DNS).

My Questions

  1. Does anyone know if a persistent connection on PDO is retrieved based on the connection string, or is more reliable using the IP or other signature? Evidence suggests it’s hostname, but would like confirmation.
  2. Does anyone know a way to mark a persistent connection as “invalid” in PDO, so that is it not used again?
  3. Or, is there something I missed?

Side notes

We already have code in place to handle the retry, and they retry is told to get a new non-persistent connection (which works). It’s at this point we could “invalidate” the PDO connection so the next run of a script does not repeat this cycle over and over.

The failover can happen at any time, so we’re not in a position to do manual actions such as restart php (as we had to do this time).

Without persistent connections, performance is notably slower.

FastCGI, Centos 16, PHP 7.2, MySQLD 5.0.12-dev (which is normal on Centos – see https://superuser.com/questions/1433346/php-shows-outdated-mysqlnd-version)

2

Answers


  1. Persistent connections must be terminated and restarted.

    Reminds me of a 2-minute TTL that took 20 minutes to be recognized. I don’t know whether Amazon does a better job, or even if they have any say in DNS.

    5.0.12?? That was released in 2005! Maybe a typo. Anyway, I don’t think the version matters in this Question.

    DNS may not be the optimal way to failover; there are several Proxy servers out there. I would expect them to flip within seconds. However, they need to know who’s who rather than depending on DNS.

    Can you modify the code to disconnect+reconnect when that error occurs? (It may not help.)

    Login or Signup to reply.
  2. Unfortunately, this error is documented:

    https://github.com/jeremydaly/serverless-mysql/issues/7

    everything said revolves around migrating to: mysqlnd driver for mysqlnd_ms

    I will continue looking for a more efficient solution.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search