iperf does not reach full speed on fast network connection such as 40 Gbps or 100 Gbps
Environment
- Red Hat Enterprise Linux
- High speed network interface such as 40 Gbps or 100 Gbps
- Bandwidth test such as iperf or netperf
Issue
- iperf does not reach full speed on fast network connection such as 40 Gbps or 100 Gbps
- Testing with iperf seems to reach a bottleneck or speed limit below NIC speed
Resolution
RHEL 7 and/or to simulate programs without zero-copy
Run multiple test programs on multiple ports and multiple CPUs. Add up the total throughput of all tests.
For example, multiple iperf servers could be run like:
for PORT in {9001..9008}; do iperf3 --server --interval 10 --port "$PORT" & done
Clients could connect to each of these servers like:
for PORT in {9001..9008}; do iperf3 --title "test$PORT" --time 60 --interval 10 --port "$PORT" --client SERVERNAME --parallel 4 | tee "iperf$PORT".txt & done
(replace SERVERNAME with your server name)
Once the client tests are complete, add the client bandwidths to find the total result:
awk '/SUM.* 0.00-60.0.*sender/{i+=$7} END {print i}' iperf*.txt
This will give a single number like 90.49 indicating all transfers averaged about 90 Gbps.
RHEL 8 and/or to simulate programs with zero-copy
Add the --zerocopy flag to iperf. This depends on kernel support, available in RHEL 8 and later.
Consider configuring a large MTU such as 9000-byte frames.
Consider pinning workloads to specific CPUs and using a NIC with ARFS to steer the flow to the right CPU. ARFS is enabled by ethtool --offload DEV ntuple on.
Ensure the test program has NUMA locality with the network interface card. Do not cross NUMA Nodes.
Root Cause
Using the iperf3 --parallel flag still has a single process manage the multiple parallel bandwidth threads.
At high enough data rates, this single process becomes a bottleneck, as the process can only run as fast as a single CPU allows.
Running multiple processes on multiple CPUs provides a way to overcome this limitation.
This is documented in the upstream iperf3 FAQ at:
The ESnet (current iperf3 maintainer) knowledgebase explains further at:
This can be measured by manually adding up the speed of multiple concurrent bandwidth tests, as the iperf3 commands shown above will print, or by monitoring network bandwidth through a separate tool such as iptraf:
When testing on these sort of high-speed network interfaces, the overall idea is usually not to achieve wirespeed on a single stream, but instead the aim is to achieve wirespeed across multiple streams.
As the speed of network interfaces increases, it becomes difficult or even impossible to achieve full wirespeed on a single CPU without zero-copy. A single test command without zero-copy is unlikely to be able to run a single stream up to speeds like 100 Gbps. That is the incorrect expectation to have of high speed network interfaces.
With zero-copy, which requires RHEL 8 or later, a single process might be able to achieve 100 Gbps with other appropriate tuning such as large MTU (eg: 9000 bytes), CPU pinning of workloads, Accelerated Receive Flow Steering (ARFS).
Good NUMA locality between NIC and application is essential. Crossing NUMA Nodes is expected to heavily restrict throughput. As much as 50% wirespeed reduction is a common result. For example, a 10G bps NIC gives 5 Gbps result when a bandwidth-heavy process is not NUMA-local.
Further Tuning
The above advice assumes that the system is already correctly configured and tuned for high bandwidth. Resources to assist with such tuning are:
- How do I tune RHEL for better TCP performance over a specific network connection?
- Red Hat Enterprise Linux Network Performance Tuning Guide
- RHEL network interface dropping packets
- How do I use 'tuned' to apply tuning profiles?
- What are CPU "C-states" and how to disable them if needed?
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.