I used netperf benchmark with the next commands:
server side:
netserver -4 -v -d -N -p
client side:
netperf -H -p -l 60 -T 1,1 -t TCP_RR
And I received the results:
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.28 () port 0 AF_INET : demo : first burst 0 : cpu bind
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 131072 1 1 60.00 9147.83
16384 131072
But when I changed the client to single CPU (same machine) by adding "maxcpus=1 nr_cpus=1" to kernel command line.
And I ran the next command:
netperf -H -p -l 60 -t TCP_RR
I received the next results:
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.28 () port 0 AF_INET : demo : first burst 0 : cpu bind
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 131072 1 1 60.00 10183.33
16384 131072
Q: I don’t understand how the performance has been improved when I decreased the CPUs number from 64 to 1 CPU?
Some technique information: I used Standard_L64s_v3 instance type of Azure; OS: sles:15:sp2
2
Answers
• The ‘netperf’ utility command executed by you on the client side is as follows and is the same after changing the number of CPUs on the client side but you can see an improvement in performance after decreasing the number of vCPUs on the client VM: –
The above command implies that you want to test the network connectivity performance between the host ‘Server’ and ‘Client’ for TCP Request/Response and get the results in a default directory path where pipes will be created for a period of 60 seconds.
• The CPU utilization measurement mechanism uses ‘proc/stat’ on Linux OS to record the time spent for such command executions. The code for this mechanism can be found in ‘src/netcpu_procstat.c’. Thus, you can check the configuration file accordingly.
Also, the CPU utilization mechanism in a virtual guest environment, i.e., a virtual machine may not reflect the actual utilization as in a bare metal environment because much of the networking processing happens outside the context of the virtual machine. Thus, as per the below documentation link by Hewlett-Packard: –
https://hewlettpackard.github.io/netperf/doc/netperf.html
As a result, I would suggest you rely on other monitoring tools available in Azure, i.e., Azure Monitor, Application insights, etc.
Looking more closely at your netperf command line:
netperf -H -p -l 60 -T 1,1 -t TCP_RR
The -H option expects to take a hostname as an argument. And the -p option expects to take a port number as an argument. As written the "-p" will be interpreted as a hostname. And when I tried it at least will fail. I assume you’ve omitted some of the command line?
The -T option will bind where netperf and netserver will run (in this case on vCPU 1 on the netperf side and vCPU 1 on the netserver side) but it will not necessarily control where at least some of the network stack processing will take place. So, in your 64-vCPU setup, the interrupts for the networking traffic and perhaps the stack will run on a different vCPU. In your 1-vCPU setup, everything will be on the one vCPU. It is quite conceivable you are seeing the effects of cache-to-cache transfers in the 64-vCPU case leading to lower transaction/s rates.
Going to multi-processor will increase aggregate performance, but it will not necessarily increase single thread/stream performance. And single thread/stream performance can indeed degrade.