I have an M5.XLarge server running in AWS that I recently upgraded from Server 2012 R2 to Server 2022 Standard. It has a GP2 1200, IOPS, Unencrypted, 400 GB drive attached to it. When I try to perform a task (e.g. SQL Backup) the Disk Read/Write speed is around 2,000,000 B/sec.
I have another identical server that I did the same upgrade and it is getting 35,000,000 B/sec.
On the slow server I’ve tried:
- Attaching a new EBS drive (same config) – Same read/write speed
- Disabling AV (Windows Defender) and Security Software (Sentinel)
- Ensuring VSS is disabled
No matter what action I perform (sql backup, copy file, disk performance utility) it is still slow.
On the old/original server (the new one was created by creating an AMI and launching a new instance), it performs fast. As a result it does seem like something happened during the upgrade process (either launching the instance, creating the drive or actually upgrading windows)
Any suggestions on things I can try to help get performance back to the old state.
2
Answers
Would that still be the appropriate EBS volume type for your workload?
For high-performance requirements, you might consider using Provisioned IOPS SSD (io1 or io2) instead of gp2 (General Purpose SSD volumes).
Make sure also that the latest AWS PV (Paravirtual) drivers are installed on the Windows Server. Outdated drivers can cause performance issues. See "Upgrade PV drivers on Windows instances".
And check the Windows Server for any misconfigured system settings that could impact disk performance. That includes power settings (set to High Performance), and disk write caching.
Verify that the instance is EBS-optimized to make sure maximum performance of your EBS volumes.
Since AWS instances have network performance tied to instance size, make sure network limitations are not affecting EBS performance.
But start with measuring and monitoring:
That way, you can measure before and after making changes to your new configuration.
I would try to remote into the server and see how long that takes over the network between the two.
Are both servers being serviced by the same network and located in the same tree. Are there any hops in between to connect?
What are the disc cache settings on the servers? Turned on/off.
What is the average queue length and pages/sec on the slower server?
If the slower instance was not provisioned in the same manner it may be missing some I/O and network support drivers.
How old are the servers?
Performance Checks guide can be found here:
Virtualization Performance Check
Also see Amazon Route 53 endpoints and quotas
Roll back the upgrade and test speeds and throughput as they were originally. I’m confused on which is slower here though the AMI provisioned server or the older server from your quote.