My website is on AWS EC2.
I checked the TTFB (Time to First Byte) with this command:
curl --output /dev/null --silent --write-out "time_namelookup=%{time_namelookup}ntime_connect=%{time_connect}ntime_appconnect=%{time_appconnect}ntime_pretransfer=%{time_pretransfer}ntime_redirect=%{time_redirect}ntime_starttransfer=%{time_starttransfer}ntime_total=%{time_total}n" --url http://13.37.46.163/
Here is the result when I run the command on my computer:
time_connect=0,014614
time_appconnect=0,000000
time_pretransfer=0,014657
time_redirect=0,000000
time_starttransfer=0,119092
time_total=0,134436
Here is the result when I run the command on the on the webserver itself:
time_namelookup=0.000058
time_connect=0.001296
time_appconnect=0.000000
time_pretransfer=0.001336
time_redirect=0.000000
time_starttransfer=0.084576
time_total=0.085031
I noticed that in both cases, the longest time is time_starttransfer.
how can I reduce this time?
What is time_starttransfer?
The time, in seconds, it took from the start until the first byte was just about to be transferred. This includes time_pretransfer and also the time the server needed to calculate the result.
My website config
My website link is: http://13.37.46.163/
It is a Grav CMS witch run with EC2 + ServerPilot + PHP7
Amazon Machine Image (AMI)
Ubuntu Server 20.04 LTS (HVM),EBS General Purpose (SSD) Volume Type. 64 bits (x86)
EC2 instance type
t2.micro
Web server
Nginx
Programmation language
PHP
Reverse proxy
Nginx
Caching
I already use Opcache which is enabled as you can see here : http://13.37.46.163/info.php#module_zend+opcache
About CDN, i already use Grav CDN Plugin. (https://github.com/getgrav/grav-plugin-cdn)
My website logs (requests/min.)
1 00:02
1 00:38
1 00:54
1 01:06
1 01:12
1 01:23
1 03:49
1 04:32
1 04:57
6 05:15
1 05:17
1 05:31
1 05:37
1 06:08
1 06:32
1 07:30
1 07:38
1 07:55
1 08:31
1 10:07
1 10:35
1 10:52
1 10:59
1 12:53
1 13:00
1 14:18
1 14:28
1 14:29
1 14:48
1 16:05
1 18:40
1 19:20
1 20:24
1 20:30
i.e., on average 1 request / minute
Test(s) performed
- Trying to run the TTFB test against a static file that Php does NOT host
I carried out the TTFB test on ‘main.js’ file.
Here the result:
time_namelookup=0.000034
time_connect=0.002659
time_appconnect=0.000000
time_pretransfer=0.002702
time_redirect=0.000000
time_starttransfer=0.003983
time_total=0.004026
Analysis of the result:
The result is satisfying (time_starttransfer=0.003983). But I think this result is due to the weight of the file which is light compared to the entire site.
We can deduce that the problem is rather on the side of PHP rather than NGINX.
- Running
top
andfree
commands to check what’s running / what’s using resources, what don’t I need?
Here the result for top
command:
+-----------+--------------+-------------+-------------+------------------+---------+---------+---------+--------+
| %Cpu(s): | 4.0 us, | 0.3 sy, | 0.0 ni, | 95.7 id, | 0.0 wa, | 0.0 hi, | 0.0 si, | 0.0 st |
+-----------+--------------+-------------+-------------+------------------+---------+---------+---------+--------+
| MiB Mem : | 978.6 total, | 75.8 free, | 332.2 used, | 570.6 buff/cache | | | | |
+-----------+--------------+-------------+-------------+------------------+---------+---------+---------+--------+
| MiB Swap: | 512.0 total, | 427.2 free, | 84.8 used. | 461.7 avail Mem | | | | |
+-----------+--------------+-------------+-------------+------------------+---------+---------+---------+--------+
I took the result when i reloaded my website to check the CPU %.
Here the result for free
command:
+-------+---------+--------+--------+--------+------------+-----------+
| | total | used | free | shared | buff/cache | available |
+-------+---------+--------+--------+--------+------------+-----------+
| Mem: | 1002052 | 334392 | 83368 | 16940 | 584292 | 478628 |
+-------+---------+--------+--------+--------+------------+-----------+
| Swap: | 524284 | 86784 | 437500 | | | |
+-------+---------+--------+--------+--------+------------+-----------+
Analysis of the results:
I maybe should use t3.micro
not t2.micro
– slightly faster and slightly cheaper.(?)
2
Answers
To improve performance and decrease the TTFB, I performed these improvements:
1 - PHP caching is critical
You should run a PHP opcache and usercache (such as APCu) in order to get the best performance out.
2 - SSD drives
SSD drives can make a big difference. Most things can get cached in PHP user cache, but some are stored as files, so SSD drives can make a big impact on performance. Avoid using network filesystems such as NFS.
3 - Cleaning the CSS
UnCSS is particularly important here. This tool examines all used CSS-selectors from a set of files and removes all selectors, not in use. You might think this sounds error-prone and unnecessary, but used intelligently it’s the most efficient reduction of a CSS-file possible.
4 - Optimizing the server
The server I host also supports Gzip-compression, and I set Expires-headers to avoid having the browser load files unnecessarily.
5 - Use a CDN
A CDN like CloudFront, CloudFlare or MaxCDN can be used to cache data closer to users. (content delivery network) Non-cached content can be retrieved from an origin.
The use of CDN can reduce asset delivery time from 30 to 3 seconds.
For Cloudfront users : don't hesitate to configure the CDN to cache your dynamic content https://www.youtube.com/watch?v=tqoDBNWBwas&t=2s
6 - Choose the good instance family (for AWS users)
For very small website, you should prefer t3.micro than t2.micro - slightly faster and cheaper.
First: Generally speaking, unless you run on OS that hasn’t been patched to support T3, you should prefer T3 over T2 (especially on Linux – I have seen some discussion about some minor cost advantages for T2 on Windows). The slight reduction in price is, in my opinion, to get you to use T3 over T2 so they can eventually retire T2. T3 uses their Nitro instance flavor which is generally better (faster), especially in network IO, although I wouldn’t expect an impact from your test. (BTW, if you are really looking for cheap, I have had good luck with the T3A instances which are even lower in price)
Second: You are using the T family of instances. From AWS:
That is all very nice speak for there are a lot of users on the same physical machine. Of course that is true for a lot of the other families too, but in this case you aren’t ‘assigned’ a core to use. You, and a lot of other people, are telling AWS that your workload isn’t all that high and you would like a cheaper instance at the expense of only using the CPU occasionally. That is fine, but AWS is trying to make money here and isn’t giving you a dedicated CPU for your T instance (again, that is the choice you told them). In return, the CPU might not be available the millisecond you want it to and the instance may need to wait until the requested resources are available to use on the physical instance. Depending on how many other people are on that instance and how over-provisioned it is, your results may vary.
To my knowledge, AWS doesn’t publish any information on how over-provisioned a T instance is. If you suspect you may have chatty neighbors, you could always switch to a different physical machine by stopping and starting the instance (you do not need to terminate the instance). That should switch which physical host you are running on, but there are no guarantees you will get a better machine. Intrinsically, you are asking for best-in-class performance from the cheapest-in-class instance family. That likely won’t work out to your expectations.
In short, if you want minimum latency and guaranteed speed, you will need to switch to a different family of instance. The ‘generic’ instance type family of M5 may be more desirable if you need more guarantees on consistent and lower latency performance.