We have a large AWS RDS running, but recently we have seen poor disk performance. We have narrowed it down to our EBSByteBalance going to 0, resulting in high latency for read & writes to the disk.
According to the docs of the underlying gp2
storage, we should have a baseline IOPS of 12k. (we have 4,000+ GB storage size)
If I check the metrics, it seems like we never hit that 12k IOPS limit. Still, our EBSByteBalance gets drained:
Am I misinterpreting the available IOPS (12k) that we have?
2
Answers
The turned out to be the limited network capacity from the RDS to the disk. The graph of our network traffic shows that we do cross the provisioned amount there (2780 Mbps for our instance type).
The question remains how to fix this issue, since the instance type is already maxed out.
What is the instance type you are using, even though you had enough IOPS some times when there is a heavy read and write activity and available RAM might drift too. Try increasing to higher instance type from the current.