Virtual Machines stop communicating over the Linux bridge when using Emulex Network cards
Environment
- Red Hat Enterprise Linux
- Virtual Machines running on the host via KVM and connected via the Linux software bridge
- Emulex NICs
Issue
On a physical host that runs a number of virtual machines connected externally via a bridge we observe that traffic between a VM and the external network suddenly stops working. The host NICs used in the bridge are produced by Emulex.
Resolution
The issue is that certain cards reflect packets back from where they arrived when SR-IOV is enabled. For a more complete explanation on the software bridge behaviour, please see: https://access.redhat.com/site/solutions/750553
There are two reasons why this can happen:
SR-IOV is enabled in the PXE BIOS of the card:
Via the command lspci -vvv we can observe when SR-IOV is enabled in the NIC's BIOS. In the section related to the network card we will see the following (as opposed to 0 VFs when the card is not SR-IOV enabled):
Initial VFs: 32, Total VFs: 32, Number of VFs: 32, Function Dependency Link: 00
Disabling SR-IOV in the NIC's BIOS will restore connectivity.
SR-IOV is disabled in the NIC but the host is an HP Blade. Networking is managed via Virtual Connect
In this case lspci -vvv will seemingly show that SR-IOV is disabled:
Initial VFs: 0, Total VFs: 0, Number of VFs: 0, Function Dependency Link: 00
In this case disabling SR-IOV via the NIC PXE BIOS is not enough and SR-IOV must also be disabled via the Virtual Connect Manager. Currently, there is no way from the Operating System to infer if SR-IOV is enabled in the Virtual Connect. A bugzilla asking for this information to be exported to the OS has been filed. This can be done by setting PXE=Disabled for the network interfaces in the Virtual Connect blade profile.
NIC firmware version
NIC firmware for HP part number 554FLB under version 4.9.311.20 has the Advanced Mode disabled by default, which can turn off the SR-IOV functionality.
Upgrading to this firmware version or later, and ensuring that Advanced Mode and SR-IOV are off, should resolve the issue.
A related HP-specific advisory is Content from h20565.www2.hp.com is not included.c04267968
Workaround
A possible workaround until SR-IOV is fully disabled is to make the bridge work like a hub via brctl setageing <bridge> 0. This causes the bridge to behave like a hub and flood the packet to all the ports (except the one, which the packet has arrived on) for every packet. This can have a performance impact as all packets need to be sent to all ports and a security one, because each virtual machine will see packets destined for all the other virtual machines as well.
Root Cause
This issue occurs due to the way the virtual Ethernet bridge in Emulex Network cards works. All transmitted broadcast packets are looped back by the controller. This affects the functionality of the Linux software bridge, as it appears as if the same ARP broadcast packets are received on two different interfaces. Note, that this happens only on cards that loop back all the broadcast packets. If the NIC does not do that, even with SR-IOV enabled, this specific issue won't be present.
Diagnostic Steps
In an an environment with an SR-IOV NIC with the following configuration:
eth0<-->br0<-->vnet0<--VM-->eth0
We will observe two behaviours:
SR-IOV disabled
With SR-IOV disabled if from the VM we do an arp resolution of the gateway we will see the following:
- In the VM
eth0:
652 33.575036 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
653 33.577890 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2 ARP 60 10.65.211.254 is at 00:00:0c:9f:f0:00
- On
vnet0on the host:
357 18.002586 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
358 18.005323 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2 ARP 60 10.65.211.254 is at 00:00:0c:9f:f0:00
- On
br0on the host:
651 18.129313 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
652 18.132042 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2 ARP 60 10.65.211.254 is at 00:00:0c:9f:f0:00
- On the
eth0NIC on the host:
653 18.129296 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
654 18.132016 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2 ARP 60 10.65.211.254 is at 00:00:0c:9f:f0:00
In this scenario everything works as expected.
SR-IOV enabled
When SR-IOV is enabled, either via PXE NIC BIOS or Virtual Connect we will observe the following:
- In the VM on
eth0. Notice how 504 and 527 are seen to quickly (every ARP packet should be one second
apart) and how even the reply is smaller than the minimal Ethernet size (60 bytes, meaning: it didn't travel through the wire):
503 26.515740 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
504 26.516009 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70 <--- reflected
526 27.515744 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
527 27.516067 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70 <--- reflected
- On the host
vnet0:
423 22.233080 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
424 22.233188 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70 <--- reflected
446 23.233081 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
447 23.233196 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70 <--- reflected
- On
br0on the host:
654 22.252122 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
655 22.252226 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70 <--- reflected
656 22.254932 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2 ARP 60 10.65.211.254 is at 00:00:0c:9f:f0:00
705 23.252123 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
706 23.252230 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70 <--- reflected
707 23.254825 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2 ARP 60 10.65.211.254 is at 00:00:0c:9f:f0:00
- On
eth0on the host:
658 22.252109 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
659 22.252202 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70 <--- reflected
660 22.254918 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2 ARP 60 10.65.211.254 is at 00:00:0c:9f:f0:00
709 23.252110 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70
710 23.252208 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.65.211.254? Tell 10.65.210.70 <--- reflected
711 23.254811 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2 ARP 60 10.65.211.254 is at 00:00:0c:9f:f0:00
So this confirms that when SR-IOV is enabled in the NIC BIOS we do see these effects and that the VMs will likely lose network connectivity due to the bridge getting confused.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.