Feature Integration document - DCB for E2E QoS
Note: This is a Technical Preview feature for RHOSO 18.0 and RHOSP 17.1.4 releases.
Introduction to this feature
In traditional Ethernet networks, when network congestion occurs, packets may be dropped or discarded. However, in networks where critical traffic coexist with regular data traffic, frame loss is unacceptable. Data Center Bridging (DCB) is a set of Ethernet enhancements and features that are specifically designed to provide a lossless and deterministic networking environment for data centers and converged networks. DCB includes several key features and standards that work together to improve the reliability and performance of Ethernet networks in these environments.
Understanding Data Center Bridging (DCB)
DCB is a set of enhancements to the Ethernet standards designed to provide lossless, low-latency communication for storage and data traffic in mainly data center networks. It encompasses various protocols and technologies to achieve these goals, with one of its key components being Enhanced Transmission Selection (ETS).
Enhanced Transmission Selection (ETS)
ETS is a feature within DCB that allows for the allocation of bandwidth among different traffic classes. It ensures that critical data, or any such high-priority traffic, receives the necessary resources without contention from less time-sensitive traffic. ETS operates by dividing available bandwidth into different priority groups, each assigned a specific share of the total bandwidth.
LLDP TLVs in ETS
Link Layer Discovery Protocol (LLDP) is used to discover and advertise information about neighboring devices. In the context of ETS, LLDP includes specific TLVs (Type-Length-Value) that convey information about the capabilities and requirements of a device regarding Enhanced Transmission Selection. This enables devices in the network to negotiate and establish optimal configurations for ETS.
DCBX and Its Role
DCB Exchange (DCBX) is a protocol that facilitates the exchange of DCB configuration information between neighboring devices. It plays a crucial role in the proper functioning of ETS by ensuring that all devices in the network are aware of the ETS configurations and can align their priorities accordingly. DCBX enables devices to negotiate and establish a common understanding of bandwidth allocations and traffic priorities.
How LLDP TLVs and DCBX Collaborate for ETS
-
LLDP TLVs Initialization: Devices use LLDP TLVs to communicate their ETS requirements and capabilities during the initialization phase.
-
Negotiation via DCBX: DCBX takes the LLDP TLV information and facilitates negotiation between devices to establish a common ETS configuration.
-
Dynamic Adjustments: As network conditions change, LLDP TLVs and DCBX allow for dynamic adjustments to ETS configurations, ensuring optimal performance.
In the intricate web of networking, DCB and ETS emerge as key technologies, providing the foundation for efficient and reliable communication. The collaborative efforts of LLDP TLVs and DCBX play a crucial role in ensuring that Enhanced Transmission Selection operates seamlessly, adapting to the dynamic demands of modern networking.
Configuration - How to use this feature?
Switch Side (TOR Switch)
Configure ETS on the switch, after which the switch sends the configuration to the host/server using LLDP DCBX TLVs.
Configuring ETS on the switch - Depending on the switch these steps differ. Users need to refer to appropriate documentation of the switch being used.
HOST Side
This is applicable to Greenfield deployments [Day 1 operation]. For Brownfield deployments see Day 2 Configuration section.
For details on deployment steps/config check the below links to understand where the os-net-config configuration is applied:
-
RHOSO 18.0 Content from github.com is not included.EDPM Values
-
RHOSP 17.1 TRIPLEO Network Config
The above links are just examples. According to the requirement you might have to do it in a different path. Further, where you see CHANGEME in the sample configurations below means you need to change it per your setup and deployment.
DSCP to priority mappings
This configuration needs to be done on the host bare metal of the compute node. The config needs to be applied on the physical interface or NIC. os-net-config provides this functionality. os-net-config runs during the deployment phase and configures the mappings as required. Persistence across reboots is also managed by a service.
os-net-config sample configuration is as below for object type interface.
- type: ovs_user_bridge
name: br-link0
use_dhcp: false
defroute: false
mtu: 9000
members:
- type: ovs_dpdk_port
name: dpdk-enp4s0f0np0 # CHANGEME
rx_queue: 4 # CHANGEME
driver: mlx5_core
members:
- type: interface
name: enp4s0f0np0 # CHANGEME
dcb:
dscp2prio:
# Add the dscp configs.
# It requires priority and protocol
- priority: 2 # CHANGEME
protocol: 24 # CHANGEME
- priority: 3 # CHANGEME
protocol: 8 # CHANGEME
- priority: 4 # CHANGEME
protocol: 12 # CHANGEME
Some additional configurations done on the ‘compute node’ are as below.
In order to verify minimum QoS, we need to exhaust the resources. In that regard we need to reach line rate and then check for minimum QoS. This was needed to reach close to line rate and exhaust the resources to verify the enforcement of minQoS
# ovs-vsctl set Open_vSwitch . other_config:userspace-tso-enable=true
Day 2 Configuration
Ideally it is advised to have the HOST side configurations in place at the time of deployment itself. But, in case we want to configure it after the deployment, os-net-config provides an utility which can be run on required compute nodes. Usage is explained below.
Note:
With Day 2 configuration,
-
nic numbering is not supported. Hence the physical NIC name must be used to configure the
devicein the config file -
WARNING:
os-net-configshould be the host configuration tool. Ifnmstateis used directly, this wont work.
Prepare a sample dcb_config.yaml as below:
dcb_config:
- type: dcb
device: ens1f0np0 # CHANGEME
dscp2prio:
- priority: 5 # CHANGEME
protocol: 40 # CHANGEME
- priority: 4 # CHANGEME
protocol: 34 # CHANGEME
Apply the dcb_config.yaml using the CLI as below:
# sudo os-net-config-dcb --config /path/to/dcb_config.yaml
DSCP marking of packets
For this feature to work reliably, it is very important that the packets are marked properly. Here there are 2 possibilities. Either packets are coming marked or not from the application. If it's the former one there is nothing to worry about, but if it is the latter and as per customer feedback which is the case with many applications, then appropriate neutron qos policies need to be put in place to mark the packets accordingly.
Sample commands to create the neutron qos-policy are as below:
# neutron qos-policy-create <POLICY_NAME>
# neutron qos-dscp-marking-rule-create <POLICY_NAME> --dscp-mark <dscp_val>
# neutron port-update <PORT_UUID> --qos-policy <POLICY_NAME>
OR
# openstack network qos policy create <POLICY_NAME>
# openstack network qos rule create --type dscp-marking --dscp-mark <dscp_val> <POLICY_NAME>
# openstack port set --qos-policy <POLICY_NAME> <port_uuid>
Note: DSCP marking of packets using neutron qos policy works for ovs-dpdk deployments. But it is NOT applicable for SR-IOV deployments. With SR-IOV deployments, applications themselves need to make sure the packets are marked with DSCP appropriately for this feature to work.
Impact of marking packets from Neutron
Test was performed on a standalone OSP deployment with only 1 VM sending traffic from an iperf client. Once with marked traffic and next without marking (but neutron qos policy set which takes care of marking). Observed that in both cases we reached a rate which is about the same. Hence we can be assured that this is not having any noticeable impact on performance.
Debuggability
Note that for this feature to work well it is important that the Switch Side configuration needs to be set properly. The procedure to configure DCB on Switch differs across vendors. Hence, customers need to work with the corresponding Switch vendor to get the necessary configs and configure them appropriately.
Once the configuration is done we can verify if configs are in place and they are reaching the compute-node from the OSP side as below.
Lets say, the TOR switch is configured with 3 Traffic classes. With bandwidth 20%, 20% and 60% for TCs 0, 1 and 2 respectively.
Further on the host(compute node) physical NIC we have configured [dscp 24, 12 and 8 to priorities 2, 4 and 3 respectively]
Check with os-net-config CLI as below. From the logs notice that the bandwidth allocation is shown 20%, 20% and 60% as configured from switch side. Also the priority queues mapped to different traffic classes are displayed. Further it also prints the DSCP to priority queue mappings configured by os-net-config. This gives sufficient information to debug incase of any issues or to just cross check if the configs of Switch and Host side are in place.
$ sudo os-net-config-dcb --show
2024-01-29 12:41:25.539 INFO os_net_config.show -----------------------------
2024-01-29 12:41:25.539 INFO os_net_config.show Interface: enp4s0f0np0
2024-01-29 12:41:25.539 INFO os_net_config.show DCBX Mode : FW Controlled
2024-01-29 12:41:25.539 INFO os_net_config.show Trust mode: dscp
2024-01-29 12:41:25.539 INFO os_net_config.show dscp2prio mapping: prio:2 dscp:24 prio:3 dscp:08 prio:4 dscp:12
2024-01-29 12:41:25.540 INFO os_net_config.show tc: 0 , tsa: ets, bw: 20%, priority: 3 7
2024-01-29 12:41:25.540 INFO os_net_config.show tc: 1 , tsa: ets, bw: 20%, priority: 0 4
2024-01-29 12:41:25.540 INFO os_net_config.show tc: 2 , tsa: ets, bw: 60%, priority: 1 2 5 6
Important points for consideration
-
This feature is a Technical Preview feature.
-
This feature is tested and verified on Host/compute-node with Mellanox NIC (Connectx-5).
-
With ovs-dpdk and in non SR-IOV mode. [i.e
os-net-configwith objects of typeinterfaceused for testing] -
With SR-IOV mode. [ i.e
os-net-configwith objects of typesriov_pfused for testing]
-
Limitations
-
With Intel NICs there is an open issue (on Intel NIC drivers) which is being actively pursued. Hence, for the 18.0 and 17.1.4 releases this feature doesn't work with Intel NIC.
More on Intel issue - Currently, when Firmware LLDP is enabled,i40edrivers don't acceptNETLINKmessages to set configuration to map DSCP to priority queue mapping. -
With multi queue enabled (on the VM side i.e. with
hw_vif_multiqueue_enabled=true) observed this feature has issues. While testing minQoS enforcement with DCB it is important to be able to exhaust the resources first in order to see its effectiveness. With multiqueue while testing couldnt exhaust the resources. It needs further investigation. So, it is recommended not to use multi queue enabled while using DCB feature. -
Currently the dcb configs can be applied on
os-net-configobjects oftypesriov_pfandinterfaceonly. Hence if the user intends to configure DCB for a bond, then the same dcb configs need to be applied under all bond members. [This content is not included.JIRA link]