I have a pretty “basic” app that we designed that was originally on a local plesk server and we migrated to GAE/GSQL/GCS. app engine, mysql, cloud storage.
Here’s some background info:
App is PHP based, and runs great on the local server. When we migrate to the cloud we notice this random yet extremely latency that happens. It’s so bad that the app times out and gives a SPDY timeout error. We utilize cloudflare for SPDY assistance so we started there and they said it’s the the server. Then we went to google. We’ve been going back and forth back and forth and I am looking for other avenues of help.
I am running an app on a F2 standard GAE instance and a G1-small CloudSQL instance (gen 2). All same region/zone. There is also a failover sql instance.
There is really no pattern to it but users on the app notice a bad timeout very frequently and it dies after 60 seconds. (which points to a PHP timeout right? We checked the code and it runs fine on the local server)
I dont have a whole lot of traffic on this app yet (maybe a few users a day) so i dont know if it’s traffic load. Here’s some basic stats for you:
Some Google Engineers said our app has trouble scaling (QPS never will get about 1)
And asked if we are threading. We are not. We do not use memcache yet either.
I also see a ton of these:
Which looks like this bug: https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/126
But I am unsure if this is all related.
We’ve tried going through Google’s tech support, they said we have “manual locks” but our dev team doesn’t agree nor know what this really means. Again, the same framework of the app (session handling etc) code is used in many apps with a ton of users on it (non GAE, they’re on compute on AWS) so this is our first venture to GAE.
We connect using standard MySQL connection parameters and use the same framework in a lot of applications and it runs fine. We use the required proxy to connect to CloudSQL.
The speed and constant lag shouldn’t be there. We don’t know what this issue could be. My questions are:
1) Do you see any issues here? All database logs are above and summaries
2) Can you help me understand what may be wrong here?
Thank you!
2
Answers
There was a query we found running that caused a huge database lag.
The biggest latency spike I can see from your Screenshot it’s about 20 seconds at 9:00 am, that its about the same time where you have the biggest amount of queries, read/write operations, and memory usage.
Even though you have a small amount of users they can be doing many queries, If GCP support suggest that it has problems scaling you can check the auto-scale property and see if it is enable.
From what i can see from your images and looking through the Cloud SQL docs I would suggest a horizontal scale of your Cloud SQL Instance.
Also take a look at diagnose-issues docs, maybe you can get more info on whats causing the MySQL aborted connections error.