Sometimes receiving packet(e.g. ICMP echo) has latency, around 190[ms].
Environment
- Red Hat OpenStack Platform 17.1
- Red Hat Enterprise Linux 9
Issue
- Sometimes receiving packet has latency. For example, ping command shows sometimes 200[ms]. The usual situation is less than 1 [ms].
- The same issue in other packet is observed with OVS-DPDK.
- There is a filter rule in nft or iptables which has LOG action.
- kernel command line (/proc/cmdline) has
console=ttyS0.
Resolution
- As a workaround, perform the following steps.
-
Replace
console=ttyS0 console=ttyS0from/proc/cmdlineby usingconsole=tty0to/proc/cmdlineand removeGRUB_SERIAL_COMMAND(if exist).
a. We assume the following /etc/default/grub file.``` GRUB_CMDLINE_LINUX="console=ttyS0 console=ttyS0,115200n81 no_timer_check crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M" GRUB_CMDLINE_LINUX_DEFAULT=" console=ttyS0,115200 no_timer_check memtest=0 boot=LABEL=mkfs_boot" ```b. Create the following yaml,
remove_serial_console.yaml``` - remove_serial_console.yaml --- - name: Remove serial console and add console hosts: allovercloud any_errors_fatal: true gather_facts: false tasks: - name: Remove serial console from GRUB_CMDLINE_LINUX become: true lineinfile: path: /etc/default/grub line: 'GRUB_CMDLINE_LINUX="no_timer_check crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M"' regexp: '^GRUB_CMDLINE_LINUX="c.*' insertafter: '^GRUB_TERMINAL_OUTPUT="console"' - name: Remove serial console from GRUB_CMDLINE_LINUX_DEFAULT become: true lineinfile: path: /etc/default/grub line: 'GRUB_CMDLINE_LINUX_DEFAULT=" console=tty0 no_timer_check memtest=0 boot=LABEL=mkfs_boot"' regexp: '^GRUB_CMDLINE_LINUX_DEFAULT=".*' insertafter: '^GRUB_GFXPAYLOAD_LINUX=auto' - name: Recreate /boot/grub2/grub.cfg become: true command: "grub2-mkconfig -o /boot/grub2/grub.cfg" - name: Recreate /boot/efi/EFI/redhat/grug.cfg become: true command: "grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg" ```c. Run the following command on Undercloud node to update grub.cfg on all nodes.
``` $ ansible-playbook -i overcloud-deploy/overcloud/config-download/overcloud/tripleo-ansible-inventory.yaml ./remove_serial_console.yaml ```d. Reboot all nodes and confirm /proc/cmdline whether
console=ttyeS0entries are removed. -
Set
0 4 0 0tokernel.printkparameter via sysctl.
a. Add the following entry to your environment file.``` parameter_defaults: ExtraSysctlSettings: kernel.printk: value: "0 4 0 0" ```b. Run
overcloud deploycommand which includes the file and other environment file used for the past deploy. -
Remove netfilter LOG rules . Manual removal of the LOG rule will be restored by every overcloud deploy command. To prevent the restoration of the LOG rule, you need to overwrite the default rule as follows.
a. Create a file contains the following environment file as an example. The name of {{role.name}}ExtraGroupVars depends what role you are using in your deployment.``` parameter_merge_strategies: ControllerExtraGroupVars: merge ComputeExtraGroupVars: merge parameter_defaults: ControllerExtraGroupVars: tripleo_firewall_default_rules: {'000 accept relatedestablished rules': { proto: all, state: ["RELATED", "ESTABLISHED"]},'001 accept all icmp': {ipversion: ipv4, proto: icmp}, '001 accept all ipv6-icmp': {ipversion: ipv6, proto: ipv6-icmp}, '002 accept all to lo interface': {proto: all, interface: lo}, '004 accept ipv6 dhcpv6': {ipversion: ipv6, dport: 546, proto: udp, state: NEW, destination: 'fe80::/64'}, '999 drop all': {proto: all, action: drop}} ComputeExtraGroupVars: tripleo_firewall_default_rules: {'000 accept related established rules': { proto: all, state: ["RELATED", "ESTABLISHED"]}, '001 accept all icmp': {ipversion: ipv4, proto: icmp}, '001 accept all ipv6-icmp': {ipversion: ipv6, proto: ipv6-icmp}, '002 accept all to lo interface': {proto: all, interface: lo}, '004 accept ipv6 dhcpv6': {ipversion: ipv6, dport: 546, proto: udp, state: NEW, destination: 'fe80::/64'}, '999 drop all': {proto: all, action: drop}} ```b. Run
overcloud deploycommand with the environment file above. -
When NIC interfaces are not managed by os-net-config(the network scripts has
NM_CONTROLLED=yes), NetworkManager triggers its auto configuration by DHCP. The DHCP Discovery packet is a broadcast packet and it will be handled by the LOG action rule on received nodes. To prevent this behavior, you need to configure either of the following options.
a. Define unused devices to disable use_dhcp in nic templates. This is configured by os-net-config triggered byopenstack overcloud node provision. The following example describes to disable the network configuration for SRIOV VF(device name and vfid should be updated according to your deployment.) NOTE If the VF devices are created for PCI-Passthrough for VMs, you should not use this way. You need to follow the NetworkManager configuration described as another way.``` - type: sriov_vf device: <device name> onboot: false vfid: <vf_id> use_dhcp: false use_dhcpv6: false defroute: false dns_servers: [] domain: [] ```
b. Define the following NetworkManager configuration for excluding NetworkManager's configuration to devices. In this example, the target device is ensXvY(SRIOV-VF devices).
```
- /etc/NetworkManager/conf.d/99-sriov-unmanaged.conf
[keyfile]
unmanaged-devices=interface-name:ens*v*
```
**NOTE**: To make the configuration effective, NetworkManager needs to be restarted.
- If the system uses NVIDIA ConnectX cards for OVS-DPDK and the packet latency happens with the interface, you should set
ovs-vsctl set interface <dpdkbond interface name> options:dpdk-lsc-interrupt=trueto the bonding device.
- In RHOSP17.1, we're removing the firewall rule which has LOG action at This content is not included.RHBZ#2293382 and
console=ttyS0entry from overcloud iamges at This content is not included.RHBZ#2293368.
Root Cause
- When a packet arrives to NIC, the packet is picked by the NIC driver and pushed to kernel network stack as skb. In OVS DPDK, PMD threads picks packets from the NIC then pushes the packet to bridge interface in OVS. OVS has a tap device which is kernel device. So, the packet from the PMD thread is processed like the normal driver does when it is pushed to the OVS's tap device. During the process of the skb started from netif_receive_skb(), the nft framework examined the packet according to the registered rule. If the system has a nft or iptables rule which has LOG action and kernel command line contains
console=ttyS0, the matched packet's log is written to serial console. Writing a log message to serial console is expensive operation and it is a blocking operation, it will consume almost 190[ms]. Due to that, while writing a log message to serial console, this operation blocks the NIC driver or Poll Mode Driver(PMD) thread in OVS-DPDK because they are waiting for the completion to send a packet to skb. As a result, packets received by NIC during the block operation are delivered later. This introduced the latency.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.