Which bonding modes work when used with a bridge that virtual machine guests or containers connect to?
Environment
- Red Hat Enterprise Linux (any version)
- Red Hat Enterprise Virtualization Hypervisor (RHEV-H) (any version)
- KVM-based or Xen-based virtual machine hypervisor
- Multiple network interfaces bonded on hypervisor
- Bonded interface in bridge
- Virtual guest network interfaces in the same bridge
- Containers (for example, LXC, Docker, Kubernetes, OpenStack, OpenShift, CoreOS, and others)
- Container configuration where the container uses its own MAC address, such as Docker's MACVLAN mode
- OpenShift Virtualization
Issue
- Which bonding modes work when used with a bridge that virtual machine guests or containers connect to?
- When using certain network interface bonding modes on RHEL or RHV virtualisation hypervisors, with the bonded interface in a bridge that virtual guests have their network interfaces bridged into, the guests will suffer network connectivity issues such as packet loss, poor performance or no connectivity altogether.
- When using containers with certain networking modes and bonding modes where the container uses its own MAC address and does not NAT on the container host, the same loss of connectivity can apply.
Resolution
Working and Recommended
- Bonding Mode 2 (balance-xor)
- Bonding Mode 4 (802.3ad aka LACP)
Mode 2 (balance-xor) and Mode 4 (802.3ad / LACP) require switch configuration to establish an "EtherChannel" or similar port grouping.
Mode 2 (balance-xor) and Mode 4 (802.3ad / LACP) may require additional load-balancing configuration, depending on the source/destination of traffic being passed through the interface.
Note: When Mode 2 is used with xmit_hash_policy: vlan+srcmac and balance-slb: 1, its behavior approximates balance-slb mode of Open vSwitch bonding, and switch configuration is not required.
Partially working
- Bonding Mode 1 (active-backup)
Note: Hypervisor active-backup bonding failover results in loss of guest network connectivity
Mode 1 (active-backup) does not require any switch configuration.
Working but Not Recommended
- Bonding Mode 0 (balance-rr)
- Bonding Mode 3 (broadcast)
Mode 0 (balance-rr) and Mode 3 (broadcast) require switch configuration to establish an "EtherChannel" or similar port grouping.
Mode 0 (balance-rr) and Mode 3 (broadcast) are likely not suitable for most customer workloads, or any workload where TCP throughput or ordered packet delivery is important. See: What is the best bonding or teaming mode for TCP traffic such as NFS, ISCSI, CIFS, etc?.
Red Hat Virtualization does not support Mode 0 for guest networks.
Not Working
- Bonding Mode 5 (balance-tlb)
- Bonding Mode 6 (balance-alb)
Link Monitoring
- MII Monitoring (
miimon) of all bonding modes works. - ARP Monitoring (
arp_intervalandarp_ip_target) of a Mode 1 (active-backup) bond does not work. - ARP Monitoring (
arp_intervalandarp_ip_target) of other bonding modes works, as long as the switch ports are in an EtherChannel or similar grouping that prevents a transmitting bond slave from sending broadcast traffic to other bond slaves. - Mode 4 (802.3ad / LACP) requires the use of
miimon.
Root Cause
Bonding Mode 0 (balance-rr)
- Red Hat Virtualization does not support Mode 0 for guest networks.
Bonding Mode 5 (balance-tlb)
-
Bonding Mode 5 (balance-tlb) works by overwriting the MAC address of the outgoing frame with the MAC address of the outgoing interface, if the interface that is sending the traffic is not the primary interface of the bond. The bonding driver performs a re-balance of the bond every 10 seconds, which changes which interface is used and which frames are being sent, and with which source MAC address.
-
This mode works well when the traffic source is the "host system" of the bond. But when bridging is used, interfaces must be in Promiscuous Mode (all incoming frames are passed to the host CPU, rather than to the NIC, which accepts its own frames and discards the others) so that the bridge can find the correct destination for incoming frames. Also, the source address must be correct to ensure that the bridge can find the correct destination for incoming frames.
-
If bonding overwrites the source MAC address, which it does to frames sent from the non-primary interface, the incoming traffic is delivered to the bond and discarded. This occurs because the frames have the destination MAC address of the bond, but not the destination IP address of the bond.
-
When a broadcast packet leaves the bond for the outside network, or the switch chooses to broadcast a packet as part of its normal
operation, there is nothing to stop that packet coming back in another bond slave and being forwarded to the bridge. When this packet originates from a bridge VM, this results in the bridge learning the VM's MAC address is accessible via the bond instead of via the internal vnet interface. The bridge then forwards any packets for that VM out the bond, instead of forwarding the packets to the VM.
Bonding Mode 6 (balance-alb)
In addition to the Mode 5 points above:
- Bonding Mode 6 (balance-alb) rewrites the source MAC address of outgoing ARP replies. As a result, remote systems intermittently receive the MAC address of the bridge interface instead of the MAC address of the guest's interface. Incoming traffic is then delivered to the bridge at Layer 2 and dropped at Layer 3, similar to what happens with Mode 5.
Link Monitoring
- Any link monitoring mode where the switch might broadcast frames to an inactive link will ruin the forwarding database in the bridge, which could cause forwarding problems. It is for this reason that a Mode 1 (active-backup) bond cannot be used with ARP Monitoring.
- Mode 4 (802.3ad / LACP) requires the use of
miimon.
Switch Configuration
From the bonding driver documentation:
The 802.3ad mode requires that the switch have the appropriate ports configured as an 802.3ad aggregation. The precise method used to configure this varies from switch to switch, but, for example, a Cisco 3550 series switch requires that the appropriate ports first be grouped together in a single etherchannel instance, then that etherchannel is set to mode "lacp" to enable 802.3ad (instead of standard EtherChannel).
The balance-rr, balance-xor and broadcast modes generally require that the switch have the appropriate ports grouped together. The nomenclature for such a group differs between switches, it may be called an "etherchannel" (as in the Cisco example, above), a "trunk group" or some other similar variation. For these modes, each switch will also have its own configuration options for the switch's transmit policy to the bond. Typical choices include XOR of either the MAC or IP addresses. The transmit policy of the two peers does not need to match. For these three modes, the bonding mode really selects a transmit policy for an EtherChannel group; all three will interoperate with another EtherChannel group.
The following configuration example is provided for a Cisco 2950 Layer 2 switch based on Cisco documenation:
Switch interfaces exchange PAgP packets only with partner interfaces configured in the auto or desirable modes. Switch interfaces exchange LACP packets only with partner interfaces configured in the active or passive modes. Interfaces configured in the on mode do not exchange PAgP or LACP packets.
...
Both the active and passive LACP modes allow interfaces to negotiate with partner interfaces to determine if they can form an EtherChannel based on criteria such as interface speed and, for Layer 2 EtherChannels, trunking state, and VLAN numbers.
interface Port-channel1
description EtherChannel
!
interface FastEthernet0/1
description EtherChannel port
channel-group 1 mode active
!
interface FastEthernet0/2
description EtherChannel port
channel-group 1 mode active
If switch configuration cannot be performed, the following ebtables bridge firewall rule may work for some environments:
# ebtables -t nat -A POSTROUTING -s <VM-MAC-address> -p ARP -o bond0 -j snat --to-source <Bond-MAC-address>
Note: ebtables is not in RHEL 5, but is available as an unsupported RPM from EPEL. ebtables is in the RHEL 6 Server base channel.
-
This bridge firewall statement would cause any broadcast (or unlearned destination MAC address) to have its source MAC address rewritten to the MAC address of the bonding interface. This should avoid having the bridge's forwarding database consider the VM to be external to the hypervisor.
-
This is not a confirmed workaround for all use cases and will require testing within an individual customer environment to confirm that it works. Mode 1 (active-backup) may be the only functioning workaround.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.