Network communication from an instance to an external network fails using Flat or VLAN provider networks on HP blades with Emulex NICs

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux Openstack Platform

Issue

  • I am using HP Blade systems for controllers and compute nodes with Emulex Network Interface. I have created vlan/flat provider networks following details in doc Is it possible to create an OpenStack instance directly connected to an external network?. But Network communication from instance to gateway in the provider network or other instances in the same network that runs on a different compute node always fails. Instances also fail to dhcp ip from neutron dhcp server. How can I resolve this?

Resolution

This is because of a known packet loop from hardware or virtual connect while using HP systems with Emulex network cards when SR-IOV is enabled. Solution is to disable SR-IOV in NIC Bios as well as Virtual Connect. Please refer article Virtual Machines stop communicating over the Linux bridge when using Emulex Network cards for more details on how to disable this.

SR-IOV is enabled in the PXE BIOS of the card:

Via the command lspci -vvv we can observe when SR-IOV is enabled in the NIC's BIOS. In the section related to the network card we will see the following (as opposed to 0 VFs when the card is not SR-IOV enabled):

        Initial VFs: 32, Total VFs: 32, Number of VFs: 32, Function Dependency Link: 00 

Disabling SR-IOV in the NIC's BIOS will restore connectivity.

SR-IOV is disabled in the NIC but the host is an HP Blade. Networking is managed via Virtual Connect

In this case lspci -vvv will seemingly show that SR-IOV is disabled:

        Initial VFs: 0, Total VFs: 0, Number of VFs: 0, Function Dependency Link: 00 

In this case disabling SR-IOV via the NIC PXE BIOS is not enough and SR-IOV must also be disabled via the Virtual Connect Manager. Currently, there is no way from the Operating System to infer if SR-IOV is enabled in the Virtual Connect. A bugzilla asking for this information to be exported to the OS has been filed. This can be done by setting PXE=Disabled for the network interfaces in the Virtual Connect blade profile.

NIC firmware version

NIC firmware for HP part number 554FLB under version 4.9.311.20 has the Advanced Mode disabled by default, which can turn off the SR-IOV functionality.

Upgrading to this firmware version or later, and ensuring that Advanced Mode and SR-IOV are off, should resolve the issue.

A related HP-specific advisory is Content from h20565.www2.hp.com is not included.c04267968

How can I verify if I am hitting this bug?

  • Spawn an instance to the provider network, identify the compute node where the instance runs by running below command.
# nova show instance-name | grep OS-EXT-SRV-ATTR:hypervisor_hostname | awk {'print $4'}
compute1.local
  • Identify the ip address assigned by neutron to the instance.
# nova show instance-name | grep network | awk {'print $5'}
192.168.171.19
  • Open the vnc console for the instance and log in to it. You should use an image that allows console log in for this test. Configure the ip address statically to the instance.
# ifconfig eth0 192.168.171.19 netmask 255.255.255.0 up
  • Keep a ping running from the instance to the default gateway which will not work. Let it run there till we finish our next tests.
ping 192.168.171.254
  • Log in to the compute node. Identify the physical interface that connects the compute node to the provider external network through an openvswitch bridge. See more details on how to get the interface at Is it possible to create an OpenStack instance directly connected to an external network?

  • Run tcpdump on that interface and look for ARP request and ARP reply which has the instance private ip and gateway ip in the packet header. If the interface is eth0, then you will see the tcpdump as below.

# tcpdump -i eth0
15:20:03.050558 ARP, Request who-has 192.168.171.19 tell 192.168.171.254, length 28
15:20:03.050583 ARP, Request who-has 192.168.171.19 tell 192.168.171.254, length 28
15:20:03.050835 ARP, Reply 192.168.171.19 is-at 00:17:a4:77:10:2c, length 38
15:20:04.053148 ARP, Request who-has 192.168.171.19 tell 192.168.171.254, length 28
15:20:04.053216 ARP, Request who-has 192.168.171.19 tell 192.168.171.254, length 28
15:20:04.053307 ARP, Reply 192.168.171.19 is-at 00:17:a4:77:10:2c, length 38
15:20:05.055105 ARP, Request who-has 192.168.171.19 tell 192.168.171.254, length 28
15:20:05.055166 ARP, Request who-has 192.168.171.19 tell 192.168.171.254, length 28
15:20:05.055200 ARP, Reply 192.168.171.19 is-at 00:17:a4:77:10:2c, length 38
  • Here we can see each ARP Request is duplicated. While the first ARP Request is the original request that came from the instance, the second ARP Request is a loop from the HP NIC or Virtual Connect. If you see this kind of loop on the interface tcpdump, it's a hardware issue and need to be resolved by involving hardware or switch vendor.

Why this loop causes network connection failure with openvswitch?


Openvswitch is a software switch that works like a normal switch. It has ports and interface with a mac address at the other end of the port connected using a virtual cable. The switch would try to automatically learn mac address -> port mapping and associate the learned mac address to a specific port to avoid broadcast flooding. Let us assume that the openvswitch bridge that connects the compute node is `br-ex` and the instance mac address is reachable via `phy-br-ex` patch-port. When the first ARP Request comes in, it comes to the bridge through `phy-br-ex`. The bridge will learn the instance mac address need to be associated with this port and will do that.

Then comes the loop for the ARP Request from the hardware. This will come to the bridge via the physical interface eth0. At this time the br-ex bridge will again learn that the mac address outside the system and it's reachable via eth0 and will associate the mac address with eht0 port.

Then comes the ARP-Reply from gateway. At this time, br-ex has an entry in mac table which says to send the ARP-Reply to eth0 instead of phy-br-ex and the ARP Reply never reaches the instance.

  • To check port name and port number in a bridge, you can run below command.
# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000017a4771014
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 1(eth0): addr:00:17:a4:77:10:14
     config:     0
     state:      0
     current:    10GB-FD
     speed: 10000 Mbps now, 0 Mbps max
 2(phy-br-ex): addr:5e:b6:f8:49:06:41
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ex): addr:00:17:a4:77:10:14
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

This says port number for eth0 is 1 and port number for phy-br-ex is 2.

  • To know which mac address is mapped to which port, you can run below command.
# ovs-appctl fdb/show br-ex
 port  VLAN  MAC                Age
    1     0  00:2a:6a:8c:d6:c4   37
    1     0  00:2a:6a:8c:dc:44   37
    1     0  fa:16:3e:af:14:8c   26
    1     0  fa:16:3e:c7:34:80   19
    1     0  fa:16:3e:68:e9:be   19
    1     0  fa:16:3e:db:21:ae   19
    1     0  00:00:5e:00:01:01    1
    1     0  fc:15:b4:fa:58:af    1
    1     0  00:17:a4:77:10:2c    1
    2    0  fa:16:3e:74:00:9b    1
  • All the mac addresses in the above exmaple that start with fa:16:3e are mac address of the instance. They should have been mapped to port 2 (phy-br-ex), but they are mapped to port 1 (eth0) due to the loop which breaks network communicate. You can verify this in your environment by using the above method and looking to which mac your instance is mapped.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.