Virtual Machines stop communicating over the Linux bridge when using Emulex Network cards

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux
  • Virtual Machines running on the host via KVM and connected via the Linux software bridge
  • Emulex NICs

Issue

On a physical host that runs a number of virtual machines connected externally via a bridge we observe that traffic between a VM and the external network suddenly stops working. The host NICs used in the bridge are produced by Emulex.

Resolution

The issue is that certain cards reflect packets back from where they arrived when SR-IOV is enabled. For a more complete explanation on the software bridge behaviour, please see: https://access.redhat.com/site/solutions/750553

There are two reasons why this can happen:

SR-IOV is enabled in the PXE BIOS of the card:

Via the command lspci -vvv we can observe when SR-IOV is enabled in the NIC's BIOS. In the section related to the network card we will see the following (as opposed to 0 VFs when the card is not SR-IOV enabled):

        Initial VFs: 32, Total VFs: 32, Number of VFs: 32, Function Dependency Link: 00 

Disabling SR-IOV in the NIC's BIOS will restore connectivity.

SR-IOV is disabled in the NIC but the host is an HP Blade. Networking is managed via Virtual Connect

In this case lspci -vvv will seemingly show that SR-IOV is disabled:

        Initial VFs: 0, Total VFs: 0, Number of VFs: 0, Function Dependency Link: 00 

In this case disabling SR-IOV via the NIC PXE BIOS is not enough and SR-IOV must also be disabled via the Virtual Connect Manager. Currently, there is no way from the Operating System to infer if SR-IOV is enabled in the Virtual Connect. A bugzilla asking for this information to be exported to the OS has been filed. This can be done by setting PXE=Disabled for the network interfaces in the Virtual Connect blade profile.

NIC firmware version

NIC firmware for HP part number 554FLB under version 4.9.311.20 has the Advanced Mode disabled by default, which can turn off the SR-IOV functionality.

Upgrading to this firmware version or later, and ensuring that Advanced Mode and SR-IOV are off, should resolve the issue.

A related HP-specific advisory is Content from h20565.www2.hp.com is not included.c04267968

Workaround

A possible workaround until SR-IOV is fully disabled is to make the bridge work like a hub via brctl setageing <bridge> 0. This causes the bridge to behave like a hub and flood the packet to all the ports (except the one, which the packet has arrived on) for every packet. This can have a performance impact as all packets need to be sent to all ports and a security one, because each virtual machine will see packets destined for all the other virtual machines as well.

Root Cause

This issue occurs due to the way the virtual Ethernet bridge in Emulex Network cards works. All transmitted broadcast packets are looped back by the controller. This affects the functionality of the Linux software bridge, as it appears as if the same ARP broadcast packets are received on two different interfaces. Note, that this happens only on cards that loop back all the broadcast packets. If the NIC does not do that, even with SR-IOV enabled, this specific issue won't be present.

Diagnostic Steps

In an an environment with an SR-IOV NIC with the following configuration:
eth0<-->br0<-->vnet0<--VM-->eth0

We will observe two behaviours:

SR-IOV disabled

With SR-IOV disabled if from the VM we do an arp resolution of the gateway we will see the following:

  1. In the VM eth0:
652  33.575036 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70 
653  33.577890 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2    ARP 60   10.65.211.254 is at 00:00:0c:9f:f0:00     
  1. On vnet0 on the host:
357  18.002586 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70    
358  18.005323 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2    ARP 60   10.65.211.254 is at 00:00:0c:9f:f0:00        
  1. On br0 on the host:
651  18.129313 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70 
652  18.132042 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2    ARP 60   10.65.211.254 is at 00:00:0c:9f:f0:00     
  1. On the eth0 NIC on the host:
653  18.129296 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70  
654  18.132016 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2    ARP 60   10.65.211.254 is at 00:00:0c:9f:f0:00      

In this scenario everything works as expected.

SR-IOV enabled

When SR-IOV is enabled, either via PXE NIC BIOS or Virtual Connect we will observe the following:

  1. In the VM on eth0. Notice how 504 and 527 are seen to quickly (every ARP packet should be one second
    apart) and how even the reply is smaller than the minimal Ethernet size (60 bytes, meaning: it didn't travel through the wire):
503  26.515740 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70
504  26.516009 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70 <--- reflected
526  27.515744 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70
527  27.516067 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70 <--- reflected
  1. On the host vnet0:
423  22.233080 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70
424  22.233188 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70 <--- reflected
446  23.233081 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70
447  23.233196 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70 <--- reflected
  1. On br0 on the host:
654  22.252122 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70
655  22.252226 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70 <--- reflected
656  22.254932 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2    ARP 60   10.65.211.254 is at 00:00:0c:9f:f0:00    
705  23.252123 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70
706  23.252230 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70 <--- reflected
707  23.254825 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2    ARP 60   10.65.211.254 is at 00:00:0c:9f:f0:00    
  1. On eth0 on the host:
658  22.252109 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70
659  22.252202 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70 <--- reflected
660  22.254918 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2    ARP 60   10.65.211.254 is at 00:00:0c:9f:f0:00    
709  23.252110 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70
710  23.252208 52:54:00:2b:d3:b2 -> ff:ff:ff:ff:ff:ff    ARP 42   Who has 10.65.211.254?  Tell 10.65.210.70 <--- reflected
711  23.254811 00:00:0c:9f:f0:00 -> 52:54:00:2b:d3:b2    ARP 60   10.65.211.254 is at 00:00:0c:9f:f0:00    

So this confirms that when SR-IOV is enabled in the NIC BIOS we do see these effects and that the VMs will likely lose network connectivity due to the bridge getting confused.

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.