I/O aborts on RHV virtual systems kernel: [0000:2b:00.2]:[qedf_eh_abort:xxxx]:1: Aborting io_req=ff5d85a9dcf3xxxx

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 8
  • RHV 4.4 host

Issue

  • I/O aborts on RHV virtual systems:

    kernel: [qed_dcbx_process_tlv:389(ens3f0-0)]Invalid priority
    kernel: [qed_dcbx_process_tlv:389(ens3f4-0)]Invalid priority
    kernel: [qed_dcbx_process_tlv:389(host_1-0)]Invalid priority
    kernel: [0000:2b:00.2]:[qedf_dcbx_handler:894]:1: DCBx event valid=1 enabled=1 fcoe prio=3.
    kernel: [0000:2b:00.2]:[qedf_eh_abort:1131]:1: Aborting io_req=ff5d85a9dcf36688 sc_cmd=ff43a86408d004f8 xid=0x602 fp_idx=0, port_id=78acc1.
    kernel: [0000:2b:00.2]:[qedf_eh_abort:1170]:1: ABTS succeeded, xid=0x602.
    kernel: [qed_dcbx_process_tlv:389(ens3f0-0)]Invalid priority
    kernel: [qed_dcbx_process_tlv:389(ens3f4-0)]Invalid priority
    kernel: [qed_dcbx_process_tlv:389(host_1-0)]Invalid priority
    ....
    2b:00.0 Ethernet controller [0200]: QLogic Corp. FastLinQ QL45000 Series 50GbE Controller [1077:1654] (rev 10)
    2b:00.1 Ethernet controller [0200]: QLogic Corp. FastLinQ QL45000 Series 50GbE Controller [1077:1654] (rev 10)
    2b:00.2 Ethernet controller [0200]: QLogic Corp. FastLinQ QL45000 Series 10/25/40/50GbE Controller (FCoE) [1077:165c] (rev 10)
    2b:00.3 Ethernet controller [0200]: QLogic Corp. FastLinQ QL45000 Series 10/25/40/50GbE Controller (FCoE) [1077:165c] (rev 10)
    

Resolution

  • To alleviate the issue, thelldpad service needs to be stopped and the system rebooted.

    # systemctl mask lldpad.service
    # systemctl disable lldpad.socket
    
  • An alternate solution is to create a customized supervdsmd systemd service file. Create or edit /etc/systemd/system/supervdsmd.service and ensure it has the contents below. Reload the service to apply: systemctl reload supervdsmd.service.

    # cat /etc/systemd/system/supervdsmd.service
    After=libvirtd.service
    

Root Cause

  • This issue looks to have started with newer out of box drivers:

    filename:       /lib/modules/4.18.0-348.20.1.el8_5.x86_64/weak-updates/qlgc-fastlinq/qedf.ko
    version:        8.59.6.0
    author:         Cavium Inc.
    description:    QLogic FastLinQ 4xxxx FCoE Module
    license:        GPL
    rhelversion:    8.5
    
  • Due to lldpad being enabled by default on RHV and the newer out of box drivers land up disabling PFC (Priority Flow Control), this has been known to cause lost frames and aborts. This was not the case for earlier inbox drivers as they would not clash with PFC; the changed code in the newer out-of-box driver however does.

Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.