skip to Main Content

I used netperf benchmark with the next commands:

server side:
netserver -4 -v -d -N -p

client side:
netperf -H -p -l 60 -T 1,1 -t TCP_RR

And I received the results:

MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.28 () port 0 AF_INET : demo : first burst 0 : cpu bind

Local /Remote

Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec

16384 131072 1 1 60.00 9147.83
16384 131072

But when I changed the client to single CPU (same machine) by adding "maxcpus=1 nr_cpus=1" to kernel command line.
And I ran the next command:

netperf -H -p -l 60 -t TCP_RR

I received the next results:

MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.28 () port 0 AF_INET : demo : first burst 0 : cpu bind

Local /Remote

Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec

16384 131072 1 1 60.00 10183.33
16384 131072

Q: I don’t understand how the performance has been improved when I decreased the CPUs number from 64 to 1 CPU?

Some technique information: I used Standard_L64s_v3 instance type of Azure; OS: sles:15:sp2

2

Answers


  1. • The ‘netperf’ utility command executed by you on the client side is as follows and is the same after changing the number of CPUs on the client side but you can see an improvement in performance after decreasing the number of vCPUs on the client VM: –

    netperf -H -p -l 60 -I 1,1 -t TCP_RR
    

    The above command implies that you want to test the network connectivity performance between the host ‘Server’ and ‘Client’ for TCP Request/Response and get the results in a default directory path where pipes will be created for a period of 60 seconds.

    • The CPU utilization measurement mechanism uses ‘proc/stat’ on Linux OS to record the time spent for such command executions. The code for this mechanism can be found in ‘src/netcpu_procstat.c’. Thus, you can check the configuration file accordingly.

    Also, the CPU utilization mechanism in a virtual guest environment, i.e., a virtual machine may not reflect the actual utilization as in a bare metal environment because much of the networking processing happens outside the context of the virtual machine. Thus, as per the below documentation link by Hewlett-Packard: –

    https://hewlettpackard.github.io/netperf/doc/netperf.html

    If one is looking to measure the added overhead of a virtualization mechanism, rather than rely on CPU utilization, one can rely instead on netperf _RR tests – path-lengths and overheads can be a significant fraction of the latency, so increases in overhead should appear as decreases in transaction rate. Whatever you do, DO NOT rely on the throughput of a _STREAM test. Achieving link-rate can be done via a multitude of options that mask overhead rather than eliminate it.

    As a result, I would suggest you rely on other monitoring tools available in Azure, i.e., Azure Monitor, Application insights, etc.

    Login or Signup to reply.
  2. Looking more closely at your netperf command line:

    netperf -H -p -l 60 -T 1,1 -t TCP_RR

    The -H option expects to take a hostname as an argument. And the -p option expects to take a port number as an argument. As written the "-p" will be interpreted as a hostname. And when I tried it at least will fail. I assume you’ve omitted some of the command line?

    The -T option will bind where netperf and netserver will run (in this case on vCPU 1 on the netperf side and vCPU 1 on the netserver side) but it will not necessarily control where at least some of the network stack processing will take place. So, in your 64-vCPU setup, the interrupts for the networking traffic and perhaps the stack will run on a different vCPU. In your 1-vCPU setup, everything will be on the one vCPU. It is quite conceivable you are seeing the effects of cache-to-cache transfers in the 64-vCPU case leading to lower transaction/s rates.

    Going to multi-processor will increase aggregate performance, but it will not necessarily increase single thread/stream performance. And single thread/stream performance can indeed degrade.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search