What is "D" state (or dstate, d-state)?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 4, 5, 6, 7, 8, 9

Issue

  • What is "D" state (or dstate or d-state)?
  • What is 'D' state of process?

Resolution

  • Linux follows the standard of traditional UNIX and computes its load average as the average number of runnable or running processes (R state), and the number of processes in uninterruptable sleep (D state) over the specified interval.

  • "D" state (TASK_UNINTERRUPTIBLE) is a state which occurs in a kernel code path where the execution can not be interrupted whilst a task is processed. The hope here is that this should be a fleeting state and in normal operation kernel threads frequently pass in and out of the TASK_UNINTERRUPTIBLE state.

    • "D" state processes are blocked, awaiting an event or resource, often Disk I/O or common resources.
    • An example of this might be a low level driver talking to hardware, perhaps retrieving network packet data from NIC firmware or accessing a block of data on a hard disk drive -- read and write IO.
    • Normally this happens extremely quickly and threads remain in this state for very short periods of time (thus not normally observed, especially in user space).
    • "D" state is a bit of a historical left over, as originally the process was thought of as being in "Disk Wait" . But now, network, locks and other resources separate from "disk io" can result in a process being in UNINTERRIPTIBLE wait state. See "Understanding Linux Process States" for additional background on process states. Specifically this sentence pretty much sums up "D" state processes: "An Uninterruptible sleep state is one that won't handle a signal right away. It will wake only as a result of awaited-upon resource becoming available or after a time-out occurs during that wait (if the time-out is specified when the process is put to sleep). "
  • The problem arises when a thread enters "D" state and fails to exit this state in a reasonable amount of time. That process is now "stuck" and any process waiting for it (possibly behind it in a queue to access the same hardware), or relying on it is similarly stuck.

    • while "reasonable amount of time" is subjective, if a task remains stalled within D state too long, then "blocked for more than ... seconds" messages are output notifying the system administrator of a potential situation that needs to be investigated or possibly requires system tuning changes.
  • To see which driver is holding the process/thread in "D" state:

    • obtain the list of threads in "D" state: ps auxH | awk '$8 ~ /^D/{print}'

    • show the stack of each thread sudo cat /proc/<PID>/stack

          for D_PID in $(ps auxH | awk '$8 ~ /^D/{print $2}');do ps -Llp $D_PID;sudo cat /proc/$D_PID/stack;echo;done
      
  • See also Why does a system high-load occur? Load average is high.

Root Cause

  • finding processes in D state is fairly common and normal
  • in most cases, this is caused by interrupted access to an i/o resource (typically local or remote storage, network filesystem, etc)
  • if a process remains stuck/stalled in D state too long, then the "stalled task" logic within the kernel will be engaged

Diagnostic Steps

  • check ps output for threads in D state, one could use something similar to:
    ps auxH | awk '$8 ~ /^D/{print}'
  • load might be high and increasing (numbers in the hundreds are possible, the 1 minute load always higher than the 5 minute, 5 minute always higher than the 15min, hinting at a constantly increasing reported load); the responsiveness of the machine does not match this high number (the machine seems to respond normally on the command line)

  • the first step towards resolution (assuming the previous two returned a positive result) is to isolate the resource (most likely storage/filesystem) which is causing this condition, this should be obvious as the common denominator of the processes in d state (current working directory, etc)

  • once the resource in question has been identified, steps should be taken to recover acces to it; depending on the particular situation, it might be possible to re-gain access to the filesystem/storage online, or a reboot of the machine might be required for a full recovery (most likely)

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.