Bonding driver uses 100% of the CPU and causes a soft lockup to occur when using mode 4 (802.3ad)

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 5.2
  • bonded network interfaces using mode 4 bonding

Issue

When using bonding mode 4(802.3ad), you may see the following message and bonding driver uses 100% of the CPU.

      kernel: BUG: soft lockup - CPU#4 stuck for 10s! [bond0:6706]
      mtaiapp02 kernel: CPU 4:
      [ ... ]
      kernel: Pid: 6706, comm: bond0 Tainted: PF     2.6.18-92.1.1.el5 #1
      kernel: RIP: 0010:[]  [] .text.lock.spinlock+0x5/0x30
      kernel: RSP: 0018:ffff81082fe8fcb8  EFLAGS: 00000286
      [ ... ]
      kernel: Call Trace:
      kernel:    [] :bonding:ad_rx_machine+0x20/0x502
      kernel:  [] :bonding:bond_3ad_lacpdu_recv+0xc1/0x1fc
      kernel:  [] :bonding:bond_3ad_lacpdu_recv+0x1eb/0x1fc
      kernel:  [] netif_receive_skb+0x330/0x3ae

Resolution

The 802.3ad state machine lock can be acquired in both softirq and no softirq context, but was not held at _bh to prevent a deadlock (which could occur if a LACPDU arrived and was processed while the lock was held).

Corrected this, now hold the state machine lock at _bh to prevent deadlock.

If you use a bonding interface with mode 4 (802.3ad), upgrading to kernel 2.6.18-128.el5 or higher resolves this problem.

Also see This content is not included.This content is not included.https://bugzilla.redhat.com/show_bug.cgi?id=457300

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.