I have written pretty basic rust server which makes http call to defined url in environment variable, parse json response and return.
Source Code
| Flamegraph – IO Call
| Dockerfile
| Flamegraph – Static Json
Nginx request – throughput ~30000 requests/second.
curl -v 172.31.50.91/
* Trying 172.31.50.91...
* TCP_NODELAY set
* Connected to 172.31.50.91 (172.31.50.91) port 80 (#0)
> GET / HTTP/1.1
> Host: 172.31.50.91
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.20.0
< Date: Sun, 05 Mar 2023 07:58:39 GMT
< Content-Type: application/octet-stream
< Content-Length: 36
< Connection: keep-alive
< Content-Type: application/json
<
* Connection #0 to host 172.31.50.91 left intact
{"status": 200, "msg": "Good Luck!"}
Nginx and Rust server are running on separate ec2 instance c6a.large in same network.
In rust server I have 2 APIs
- Returns static response => Throughput 47000 requests/second
- Make HTTP request to Nginx server -> Parse Json -> Return parsed data. => Throughput 2462 requests/second. [Issue]
Similar benchmark done in GoLang which uses Fiber as http server in prefork model, json-iterator. It’s able to make call to nginx and return response at rate ~20000 requests/second. Which means there are no issues with infra/docker/client used to test rust server.
There must be something missing in my rust code which causes such regression when http call introduced.
Need help with understanding why this would happen and how to improve rust code.
Benchmark result
[ec2-user@ip-172-31-50-91 ~]$ hey -z 10s http://172.31.50.22:80/io
Summary:
Total: 10.0168 secs
Slowest: 0.0692 secs
Fastest: 0.0006 secs
Average: 0.0203 secs
Requests/sec: 2462.4534
Total data: 813978 bytes
Size/request: 33 bytes
Response time histogram:
0.001 [1] |
0.007 [12766] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.014 [1185] |■■■■
0.021 [227] |■
0.028 [494] |■■
0.035 [1849] |■■■■■■
0.042 [3840] |■■■■■■■■■■■■
0.049 [3127] |■■■■■■■■■■
0.055 [992] |■■■
0.062 [174] |■
0.069 [11] |
My attempts to improve perf.
- Both golang and rust are running on docker container on same instance one at a time.
- System ulimit / somaxcon has been updated to not cause any bottleneck, since static response able to perform 47K rps, it shouldn’t cause limitation
- Moved external url to lazy_static but it didn’t improve performance
lazy_static! {
static ref EXTERNAL_URL: String = env::var("EXTERNAL_URL").unwrap();
}
- Tried changing tokio flavour config, workerthreads = 2, 10, 16 – it didn’t improve perf.
#[tokio::main(flavor = "multi_thread", worker_threads = 10)]
- Looked into how to make sure hyper network call is being done in tokio async compatible way -> Earlier It had 247 requests/second. Was able to improve IO call by 10x via moving to stream based response processing. Reaching 2400 but still there is scope to improve.
IO Call API – GitHub Link
pub async fn io_call( State(state): State<AppState>) -> Json<IOCall> {
let external_url = state.external_url.parse().unwrap();
let client = Client::new();
let resp = client.get(external_url).await.unwrap();
let body = hyper::body::aggregate(resp).await.unwrap();
Json(serde_json::from_reader(body.reader()).unwrap())
}
This usecase is very similar to reverse proxy server.
2
Answers
Thanks to @kmdreko
Moving hyper client initialization to AppState resolved the problem.
Git diff
You need to narrow down the possible cause… Here are a few pointers that should lead you in the right direction…
Server
Test client
Is it possible that your test client is the bottleneck?
Network
The following shouldn’t be the case if you’ve run the static/Go tests with the same setup, but just in case:
If the above doesn’t help, try testing the same setup locally and see what you can achieve. If no difference, try with smaller JSONs and see if that helps.