The host encountered kernel panic with message "GAB WARNING"
Environment
- Red Hat Enterprise Linux
- Veritas cluster
Issue
- Host crashed and rebooted. Log messages shows:
kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
kernel: GAB WARNING V-15-1-20138 Port h isolated due to client process failure
kernel: Kernel panic - not syncing: GAB: Port h halting system due to client process failure
Resolution
- Contact Red Hat Support and Veritas support, providing a vmcore and logs to each for further analysis.
Root Cause
The above messages show that VCS fenced the box via a panic mechanism. The situation seems similar to that mentioned in following Symantec's document:
Excerpt:
When a node or domain gets so busy that the VCS engine (HAD) does not respond to other cluster members, GAB panics the node. In VCS 1.3 and later, GAB tries to kill HAD five times before panicking the node or before committing to panic the node.
This means that buffers are flushed and a core dump takes place, essentially providing diagnostic information to determine the cause of the panic. This behavior is better than a halt because a halt does not provide any diagnostic information and does not guarantee that a node is halted within a definite period of time. If the node is very busy, the halt process may get swapped out.
The HAD process regularly heartbeats to the GAB module. The HAD process may get timed out because of various reasons:
- It may not get a chance to run because of the system load. In order to minimize this possibility, the HAD process runs as a high priority real time (RT) process.
- It may be swapped out because of lack of physical memory. In VCS 1.3 and above, HAD startup pages are locked in memory.
- It may be executing a system call in the kernel.
Diagnostic Steps
- System crashed after following messages:
kernel: --- salvaged messages from crash dump start
kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
kernel: GAB WARNING V-15-1-20138 Port h isolated due to client process failure
kernel: Kernel panic - not syncing: GAB: Port h halting system due to client process failure
kernel: ----------- [cut here ] --------- [please bite here ] ---------
kernel: Kernel BUG at panic:75
- System consumed almost all the available memory including swap:
PAGES TOTAL PERCENTAGE
TOTAL MEM 24711401 94.3 GB ----
FREE 52603 205.5 MB 0% of TOTAL MEM
USED 24658798 94.1 GB 99% of TOTAL MEM
SHARED 14484 56.6 MB 0% of TOTAL MEM
BUFFERS 149 596 KB 0% of TOTAL MEM
CACHED 4104 16 MB 0% of TOTAL MEM
SLAB 64841 253.3 MB 0% of TOTAL MEM
TOTAL HIGH 0 0 0% of TOTAL MEM
FREE HIGH 0 0 0% of TOTAL HIGH
TOTAL LOW 24711401 94.3 GB 100% of TOTAL MEM
FREE LOW 52603 205.5 MB 0% of TOTAL LOW
TOTAL SWAP 524286 2 GB ----
SWAP USED 524286 2 GB 100% of TOTAL SWAP
SWAP FREE 0 0 0% of TOTAL SWAP
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.