The host encountered kernel panic with message "GAB WARNING"

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux
  • Veritas cluster

Issue

  • Host crashed and rebooted. Log messages shows:
kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
kernel: GAB WARNING V-15-1-20138 Port h isolated due to client process failure
kernel: Kernel panic - not syncing: GAB: Port h halting system due to client process failure

Resolution

  • Contact Red Hat Support and Veritas support, providing a vmcore and logs to each for further analysis.

Root Cause

The above messages show that VCS fenced the box via a panic mechanism. The situation seems similar to that mentioned in following Symantec's document:

Content from www.symantec.com is not included.Intentional Panic of a node in a Veritas Cluster Server cluster by GAB

Excerpt:

When a node or domain gets so busy that the VCS engine (HAD) does not respond to other cluster members, GAB panics the node. In VCS 1.3 and later, GAB tries to kill HAD five times before panicking the node or before committing to panic the node.

This means that buffers are flushed and a core dump takes place, essentially providing diagnostic information to determine the cause of the panic. This behavior is better than a halt because a halt does not provide any diagnostic information and does not guarantee that a node is halted within a definite period of time. If the node is very busy, the halt process may get swapped out.

The HAD process regularly heartbeats to the GAB module. The HAD process may get timed out because of various reasons:

  • It may not get a chance to run because of the system load. In order to minimize this possibility, the HAD process runs as a high priority real time (RT) process.
  • It may be swapped out because of lack of physical memory. In VCS 1.3 and above, HAD startup pages are locked in memory.
  • It may be executing a system call in the kernel.

Diagnostic Steps

  • System crashed after following messages:
kernel: --- salvaged messages from crash dump start
kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
kernel: GAB WARNING V-15-1-20138 Port h isolated due to client process failure
kernel: Kernel panic - not syncing: GAB: Port h halting system due to client process failure
kernel: ----------- [cut here ] --------- [please bite here ] ---------
kernel: Kernel BUG at panic:75
  • System consumed almost all the available memory including swap:
              PAGES        TOTAL      PERCENTAGE
 TOTAL MEM  24711401      94.3 GB         ----
      FREE    52603     205.5 MB    0% of TOTAL MEM
      USED  24658798      94.1 GB   99% of TOTAL MEM
    SHARED    14484      56.6 MB    0% of TOTAL MEM
   BUFFERS      149       596 KB    0% of TOTAL MEM
    CACHED     4104        16 MB    0% of TOTAL MEM
      SLAB    64841     253.3 MB    0% of TOTAL MEM

TOTAL HIGH        0            0    0% of TOTAL MEM
 FREE HIGH        0            0    0% of TOTAL HIGH
 TOTAL LOW  24711401      94.3 GB  100% of TOTAL MEM
  FREE LOW    52603     205.5 MB    0% of TOTAL LOW

TOTAL SWAP   524286         2 GB         ----
 SWAP USED   524286         2 GB  100% of TOTAL SWAP
 SWAP FREE        0            0    0% of TOTAL SWAP
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.