How are the values for different policies in "xmit_hash_policy" bonding parameter calculated?
Environment
- Red Hat Enterprise Linux
- Bonding driver providing link aggregation in Mode 2 (
balance-xor) or Mode 4 (802.3adaka LACP) or Mode 5 (balance-tlb) or Mode 6 (balance-alb)
Issue
- How are the values for different policies in
xmit_hash_policybonding parameter calculated? - We need to understand the practical implementation of the logic/math behind the load balancing algorithms.
- How the algorithms are employed for each of the three policies
layer2,layer2+3,layer3+4,encap2+3,encap3+4andvlan+srcmac? - What formula is used to compute the Network bonding hashing policies?
- What are the different hash policies in network bonding and how to configure it?
Resolution
Configuration
The xmit_hash_policy load balancing parameter can be used with mode=2, mode=4, mode=5 and mode=6. However, mode=5 and mode=6 it will applied only if tlb_dynamic_lb=0 has been set.
For example, consider we have to configure bondX as mode=2 balance-xor with xmit_hash_policy=layer2+3:
### if using network service we can modify BONDING_OPTS in ifcfg-bondX to:
BONDING_OPTS="miimon=100 mode=2 xmit_hash_policy=layer2+3"
### if using NetworkManager we can use:
# nmcli con modify bond.options "miimon=100,mode=2,xmit_hash_policy=layer2+3"
Complete configuration for bonding devices is discussed at:
layer2
The layer2 policy uses the XOR of source and destination MAC addresses and ethernet protocol type.
The calculation is:
hash = source MAC XOR destination MAC XOR packet type ID
slave number = hash modulo slave count
This algorithm will place all traffic to a particular network peer on the same slave.
If network traffic is between this system and multiple other systems in the same broadcast domain, this is a good algorithm.
If network traffic is mostly between this system and multiple other systems behind a default gateway, another algorithm should be considered.
This algorithm is 802.3ad compliant.
This is the default policy if no configuration is provided.
layer2+3
The layer2+3 policy uses the XOR of source and destination MAC addresses and IP addresses.
The calculation is:
hash = source MAC XOR destination MAC XOR packet type ID
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
slave number = hash modulo slave count
This algorithm will place all traffic to a particular IP address on the same slave.
If network traffic between this system and multiple other systems goes through a default gateway, this is a good algorithm.
If network traffic is mostly between this system and one other system, another algorithm should be considered.
For non-IP traffic, the formula is the same as for the layer2 transmit policy.
This algorithm is 802.3ad compliant.
layer3+4
The layer3+4 policy uses the XOR of source and destination ports and IP addresses.
The calculation is:
hash = source port , destination port (as in the header)
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
hash = hash RSHIFT 1
slave number = hash modulo slave count
If network traffic between this system and another system uses the same IPs but multiple ports, this algorithm is a good choice.
For non-IP traffic, the formula is the same as for the layer2 transmit policy.
This algorithm is not 802.3ad compliant.
For fragmented TCP or UDP packets and all other IP protocol traffic, the source and destination port information is omitted. This policy is intended to mimic the behavior of certain switches, notably Cisco switches with PFC2 as well as some Foundry and IBM products.
A single TCP or UDP conversation containing both fragmented and unfragmented packets may see traffic balanced across two interfaces, which may result in Out-of-Order delivery. Most traffic types will not meet this criteria, as TCP rarely fragments traffic, and most UDP traffic is not involved in extended conversations. Other implementations of 802.3ad may or may not tolerate this noncompliance.
encap2+3
This policy uses the same formula as layer2+3 but it relies on skb_flow_dissect to obtain the header fields which might result in the use of inner headers if an encapsulation protocol is used.
This will improve the performance for tunnel users because the packets will be distributed according to the encapsulated flows.
encap3+4
This policy uses the same formula as layer3+4 but it relies on skb_flow_dissect to obtain the header fields which might result in the use of inner headers if an encapsulation protocol is used.
This will improve the performance for tunnel users because the packets will be distributed according to the encapsulated flows.
vlan+srcmac
The vlan+srcmac policy uses the XOR of vlan ID and source MAC vendor and source MAC dev.
The calculation is:
hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
slave number = hash modulo slave count
This policy uses a very rudimentary vlan ID and source mac hash to load-balance traffic per-vlan, with failover should one leg fail.
The intended use case is for a bond shared by multiple virtual machines, all configured to use their own vlan, to give lacp-like functionality without requiring lacp-capable switching hardware.
This feature is available from RHEL 8.4 or kernel-4.18.0-305.el8 onwards.
Single Stream
For traffic where the primary use is a single large Layer 4 stream, such as a single NFS mount, or single iSCSI target/initiator, or other persistent single TCP/UDP connection, this traffic cannot be load balanced.
If a single persistent stream is required to go faster, faster network interfaces and network infrastructure must be used.
Diagnostic Steps
The relevant code that deals with the hash policies is:
5.14.0-284.11.1.el9/drivers/net/bonding/bond_main.c
Following xmit policies are available:
#define BOND_XMIT_POLICY_LAYER2 0 /* layer 2 (MAC only), default */
#define BOND_XMIT_POLICY_LAYER34 1 /* layer 3+4 (IP ^ (TCP || UDP)) */
#define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
#define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */
#define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */
#define BOND_XMIT_POLICY_VLAN_SRCMAC 5 /* vlan + source MAC */
/**
* bond_xmit_hash - generate a hash value based on the xmit policy
* @bond: bonding device
* @skb: buffer to use for headers
*
* This function will extract the necessary headers from the skb buffer and use
* them to generate a hash based on the xmit_policy set in the bonding device
*/
u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
{
if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
skb->l4_hash)
return skb->hash;
return __bond_xmit_hash(bond, skb, skb->data, skb->protocol,
skb_mac_offset(skb), skb_network_offset(skb),
skb_headlen(skb));
}
/* Generate hash based on xmit policy. If @skb is given it is used to linearize
* the data as required, but this function can be used without it if the data is
* known to be linear (e.g. with xdp_buff).
*/
static u32 __bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, const void *data,
__be16 l2_proto, int mhoff, int nhoff, int hlen)
{
struct flow_keys flow;
u32 hash;
if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
!bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
return bond_eth_hash(skb, data, mhoff, hlen);
if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
hash = bond_eth_hash(skb, data, mhoff, hlen);
} else {
if (flow.icmp.id)
memcpy(&hash, &flow.icmp, sizeof(hash));
else
memcpy(&hash, &flow.ports.ports, sizeof(hash));
}
return bond_ip_hash(hash, &flow, bond->params.xmit_policy);
}
/* L2 hash helper */
static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
{
struct ethhdr *ep;
data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
if (!data)
return 0;
ep = (struct ethhdr *)(data + mhoff);
return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
}
static u32 bond_ip_hash(u32 hash, struct flow_keys *flow, int xmit_policy)
{
hash ^= (__force u32)flow_get_u32_dst(flow) ^
(__force u32)flow_get_u32_src(flow);
hash ^= (hash >> 16);
hash ^= (hash >> 8);
/* discard lowest hash bit to deal with the common even ports pattern */
if (xmit_policy == BOND_XMIT_POLICY_LAYER34 ||
xmit_policy == BOND_XMIT_POLICY_ENCAP34)
return hash >> 1;
return hash;
}
static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
{
struct ethhdr *mac_hdr;
u32 srcmac_vendor = 0, srcmac_dev = 0;
u16 vlan;
int i;
data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
if (!data)
return 0;
mac_hdr = (struct ethhdr *)(data + mhoff);
for (i = 0; i < 3; i++)
srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
for (i = 3; i < ETH_ALEN; i++)
srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i];
if (!skb_vlan_tag_present(skb))
return srcmac_vendor ^ srcmac_dev;
vlan = skb_vlan_tag_get(skb);
return vlan ^ srcmac_vendor ^ srcmac_dev;
}
Here, we use flow in order to find actual packet header information such as ip and port detail.
The BOND_XMIT_POLICY_ENCAP23 and BOND_XMIT_POLICY_ENCAP34 work like normal layer23 or layer34 xmit policy, but helps in parsing an encapsulated packet and read the IP and Network header from it for doing the hashing.
Following is the HASH computation for selecting interface for sending out data based on bonding mode selection:
Assumed Topology
----------------
Server
bond0
MAC: 00:1b:21:74:b6:39
IP : 169.254.92.64 = 0xA9FE5C40
UDP: 12243 = 0x2FD3
packet ID: = 0x0800 (considering IPv4)
NIC_Count = 2
NIC0 assigned # value: 0
NIC1 assigned # value: 1
Destination
Client1
MAC: 00:1a:22:12:34:59
IP : 192.168.1.11 = 0xC0A8010A
UDP: 42424 = 0xA5B8
Client2
MAC: 00:1e:c1:07:45:1A
IP : 192.168.100.24 = 0xC0A86418
UDP: dst port 42424 = 0xA5B8
Mode Behaviour
--------------
1. layer2:
Hash = ( SRC_MAC[5] ^ DST_MAC[5] ^ packet ID ) % NIC_Count
Server --> Client1
Hash = ((0x0039 ^ 0x0059) ^ 0x0800) % 2 = 0 ---> send packet through NIC0
Server --> Client2
Hash = ((0x0039 ^ 0x001A) ^ 0x0800) % 2 = 1 ---> send packet through NIC1
2. layer2+3:
hash = source MAC XOR destination MAC XOR packet type ID
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
slave number = hash modulo slave count
Server --> Client1
hash = (0x0039 ^ 0x0059) ^ 0x0800) = 0x0860
hash = 0x0860 ^ ( 0xA9FE5C40 ^ 0xC0A8010A ) ) = 0x6956552A
hash = 0x6956552A ^ (0x6956552A >> 16) = 0x69563C7C
hash = 0x69563C7C ^ (0x69563C7C >> 8) = 0x693F6A40
slave number = 0x693F6A40 % 2 = 0 ---> send packet through NIC0
Server --> Client2
hash = (0x0039 ^ 0x001A) ^ 0x0800) = 0x0835
hash = 0x0835 ^ ( 0xA9FE5C40 ^ 0xC0A86418 ) = 0x6956306D
hash = 0x6956306D ^ (0x6956306D >> 16) = 0x6956593B
hash = 0x6956593B ^ (0x6956593B >> 8) = 0x693F0F62
slave number = 0x693F0F62 % 2 = 0 ---> send packet through NIC0
3. layer3+4:
hash = source port , destination port (as in the header)
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
hash = hash RSHIFT 1
Server --> Client1
hash = (0x2FD3 , 0xA5B8) = 0x2FD3A5B8
hash = 0x2FD3A5B8^ ( 0xA9FE5C40 ^ 0xC0A8010A ) = 0x4685F8F2
hash = 0x4685F8F2 ^ (0x4685F8F2 >> 16) = 0x4685BE77
hash = 0x4685BE77 ^ (0x4685BE77 >> 8) = 0x46C33BC9
hash = 0x46C33BC9 >> 1 = 0x23619DE4
slave number = 0x23619DE4 % 2 = 0 ---> send packet through NIC0
Server --> Client2
hash = (0x2FD3 , 0xA5B8) = 0x2FD3A5B8
hash = 0x2FD3A5B8 ^ ( 0xA9FE5C40 ^ 0xC0A86418 ) = 0x46859DE0
hash = 0x46859DE0 ^ (0x46859DE0 >> 16) = 0x4685DB65
hash = 0x4685DB65 ^ (0x4685DB65 >> 8) = 0x46C35EBE
hash = 0x46C35EBE >> 1 = 0x2361AF5F
slave number = 0x2361AF5F % 2 = 1 ---> send packet through NIC1
4. vlan+srcmac
Consider bond has VLAN interface with VLAN ID 100 and 101
hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
Server wth VLAN 100(0x64) --> Client1
hash = 0x64 ^ 0x001B21 ^ 0x74B639 = 0x74AD7C
slave number = 0x74AD7C % 2 = 0 ---> send packet through NIC0
Server wth VLAN 101(0x65) --> Client1
hash = 0x65 ^ 0x001B21 ^ 0x74B639 = 0x74AD7D
slave number = 0x74AD7D % 2 = 1 --> send packet through NIC1
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.