Why does kernel.hung_task_panic = 1 not trigger a system panic in RHEL?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 6
    • earlier than 2.6.32-696.el6
  • Red Hat Enterprise Linux 7
    • earlier than 3.10.0-693.el7

Issue

  • Since there has been hung_task_timeout_secs for a while, kernel.hung_task_panic is set to 1 by administrator.
  • However kernel.hung_task_panic = 1 does not trigger panic and does not show hung_task_timeout_secs.

Resolution

  • When kernel.hung_task_warnings decreases down to zero, system will not trigger panic even if kernel.hung_task_panic is set to 1.
  • This is expected behaviour, however, this has been fixed
    • since Red Hat Enterprise Linux 6.9 - 2.6.32-696.el6 via RHSA-2017-0817
  • since Red Hat Enterprise Linux 7.4 - 3.10.0-693.el7 via RHSA-2017:1842

Root Cause

kernel/hung_task.c:

static void check_hung_task(struct task_struct *t, unsigned long timeout)
{
        unsigned long switch_count = t->nvcsw + t->nivcsw;

        /*
         * Ensure the task is not frozen.
         * Also, skip vfork and any other user process that freezer should skip.
         */
        if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))
            return;

        /*
         * When a freshly created task is scheduled once, changes its state to
         * TASK_UNINTERRUPTIBLE without having ever been switched out once, it
         * musn't be checked.
         */
        if (unlikely(!switch_count))
                return;

        if (switch_count != t->last_switch_count) {
                t->last_switch_count = switch_count;
                return;
        }
        if (!sysctl_hung_task_warnings)    <<<---------------------- if hung_task_warnings equal to zero, function should be exited without any action.
                return;
        sysctl_hung_task_warnings--;       <<<---------------------- otherwise hung_task_warnings is decreasing by 1

        /*
         * Ok, the task did not get scheduled for more than 2 minutes,   <<<----------- kernel log is printed out
         * complain:
         */
        printk(KERN_ERR "INFO: task %s:%d blocked for more than "
                        "%ld seconds.\n", t->comm, t->pid, timeout);
        printk(KERN_ERR "\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
                        " disables this message.\n");
        sched_show_task(t);
        debug_show_held_locks(t);

        touch_nmi_watchdog();

        if (sysctl_hung_task_panic) {                            <<<---------------- if hung_task_panic is set to 1?
                trigger_all_cpu_backtrace();
                panic("hung_task: blocked tasks");               <<<---------------- do panic, then kexec can enter the vmcore collection progress
        }
}
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.