How to collect system information to provide to Red Hat Support for analysis when a system hangs
Environment
- Red Hat Enterprise Linux [All versions]
Issue
- How can I collect system information to provide to Red Hat Support for analysis when the system hangs or becomes unresponsive?
- What can be done if RHEL system goes to hang state?
- How to start with root cause analysis (RCA) for system hung issues?
Resolution
Note: This document only covers common situations with unresponsive systems. Please consult Red Hat Support for specific cases: This content is not included.Red Hat Technical Support – contact numbers and availability
For root cause analysis of why a system became unresponsive, various pieces of information are necessary, including a system core dump ("vmcore").
Preparation steps
- Pre-configure kdump/netdump/diskdump
- Enable sysrq
- Enable nmi_watchdog
- Test the configuration successfully.
These steps are discussed in more detail below.
Steps to collect information when the problem happens
1. Check the status of system
Check
- If the system can be logged in via ssh or telnet,
- If the system can be logged in via console,
- If the system can be pinged ok,
- If the system responds to keyboard or mouse in any way.
If there is a response to any of these, it means that the system can still respond to some interrupts.
2. Get information about the system state through sysrq
- Note: If you plan to follow the next step, "3. Crash the system to obtain a vmcore", please skip this step since the system state will fully be captured in the vmcore.
- Run the following key combinations:
Alt + SysRq + m
Alt + SysRq + w
Alt + SysRq + t
Alt + SysRq + p
Note: Please run these key combinations three times with an interval of about 3 minutes in between. Capture the information that is printed on the screen directly or, if the system is configured for netconsole, from the server that logs its console messages.
3. Crash the system to obtain a vmcore
- Run the following key combination:
Alt + SysRq + c
4. Get sosreport/sysreport
When the vmcore file is finished, the system should be rebooted. After the system is rebooted, generate a sosreport/sysreport.
- Red Hat Enterprise Linux 4.6 and later:
# sosreport - Before Red Hat Enterprise Linux 4.5:
# sysreport
Note:
- If the keyboard or mouse is connected to KVM, there may be no response when keyboard or mouse are used. If possible, use a PS/2 keyboard to connect to the system directly instead.
- If you are unsure of the situation the system is in, please contact Red Hat Support first.
- Capturing a vmcore may take a long time during which the system cannot perform its regular function. You may need to make a business decision whether to wait for the vmcore to be captured (thus increasing the chances a root cause can be identified for the issue and a reoccurrence can be prevented) or whether to restore the system to its regular functions on short notice.
- Some systems use warning lights or beeps to inform the operator of a hardware error. If the system reports a hardware error, please also report this information to Red Hat Support.
Comments
In order to collect these information, some settings need to be pre-configured.
1. Configure kdump/netdump/diskdump
Please refer the following articles.
- Red Hat Enterprise Linux 5, 6, 7, 8 and 9 : How do I troubleshoot kernel crashes, hangs, or reboots with kdump on Red Hat Enterprise Linux?
- Red Hat Enterprise Linux 3 and 4: This content is not included.How do I configure netdump on Red Hat Enterprise Linux 3 and 4? (If you want to use diskdump instead of netdump, please refer to: This content is not included.How do I use the diskdump utility to capture a vmcore in Red Hat Enterprise Linux 3 and 4?)
Note:
- For netdump, the system needs to have a network card whose driver supports the netdump feature; refer to How do I find out what Ethernet network card drivers support the kernel debugging tool netdump in Red Hat Enterprise Linux 4?
- Diskdump is a deprecated method for capturing vmcore. If possible, use netdump instead.
2. Enable sysrq
Magic SysRq key is a 'magical' key combo you can hit which the kernel will respond to regardless of whatever else it is doing, unless it is completely locked up. For security reasons, Red Hat Enterprise Linux disables the SysRq key by default.
- To enable sysrq, run:
# echo 1 > /proc/sys/kernel/sysrq - To enable it permanently, set the kernel.sysrq value in /etc/sysctl.conf to 1. That will cause it to be enabled on reboot.
kernel.sysrq = 1
3-1. Enable nmi_watchdog
The Non-Maskable Interrupt (NMI) Watchdog in Red Hat Enterprise Linux is a mechanism used to detect system lockups . It has been available since Red Hat Enterprise Linux 3 Update 3. By default, nmi_watchdog is enabled on 64-bit systems. How to enable nmi_watchdog, please refer to: What is NMI and what can I use it for?
3-2. Enable NMI switch
Some times SysRq key and NMI watchdog do not work. If these do not work, use NMI switch. For details of NMI switch, please refer to How can I configure my system to crash when NMI switch is pushed?.
4. Enable netconsole
Netconsole allows dmesg output to be transmitted via the network through the use of syslog. It implements kernel-level network logging via UDP port 514. Please see the following article for reference: How do I configure netconsole?
Note:
- In RHEL5, netconsole over a bonded network interface is only supported in kernel 2.6.18-238 and later.
- If system have serial connections available, one can setup a serial console to grab dmesg output, or even control the system over a serial terminal session. Please check the following article for reference: How do I set up a serial terminal and/or console in Red Hat Enterprise Linux?
5. Install sysstat package
This package provides the sar and iostat commands for Linux. Sar and iostat enable system monitoring of disk, network, and other I/O activity. This package is not installed by default. Please install it manually. If you does not install the sysstat package, install the package by yum or up2date command.
- Red Hat Enterprise Linux 5 and 6
# yum install sysstat - Red Hat Enterprise Linux 3 and 4
# up2date -i sysstat
- /etc/cron.d/sysstat: before edit
*/10 * * * * root /usr/lib/sa/sa1 1 1 - /etc/cron.d/sysstat: after edit
*/1 * * * * root /usr/lib/sa/sa1 1 1
6. Test kdump/netdump/diskdump configuration
- Run the following command to trigger kernel panic:
# echo c > /proc/sysrq-trigger
Two purposes are considered for this step:
- make sure that dump configuration can work well and capture vmcore file.
- make sure how long it will take to capture vmcore file. This is useful for system admin to determine if such long down time is acceptable.
Note:
- The command
echo c > /proc/sysrq-triggeris used to trigger kernel panic manually which will lead to system crash and all applications will stop working. This operation should be done in maintenance time. - If the system configuring kdump/netdump/diskdump is one node of Red Hat Cluster, the cluster.conf should be modified to increase post_fail_delay parameter. It should be long enough for system to dump the memory information before it's fenced by other nodes.
- If there is iptables settings in the production environment, make sure the ports are not blocked by iptables.
-
kdump can use network service to transfer vmcore file,such as ssh and nfs. All ports that ssh or nfs uses should be accepted in iptables settings.
-
netdump server and client both use UDP 6666 port by default. The port for netdump server can be set in /etc/sysconfig/netdump-server. Detailed information, please man 8 netdump-server. The port for netdump client can be specified in /etc/sysconfig/netdump.
-
netconsole needs UDP 514 port for syslog.
-
7. sosreport/sysreport
Refer the following articles.
- What is an "sosreport" and how do I create it in Red Hat Enterprise Linux 4.6 and later?
- What is a sysreport and how do I run it in Red Hat Enterprise Linux?
8. Capturing a vmcore from a hypervisor
Refer the following articles.
- How to capture a vmcore of hung Red Hat Enterprise Linux VMware® guest system using VMware® "vmss2core" tool ?
- How do I capture a vmcore from a KVM or RHEV guest?
References
- kdump: /usr/share/doc/kexec-tools-<version>/kexec-kdump-howto.txt
- diskdump: /usr/share/doc/diskdumputils-<version>/README
- sysrq: /usr/share/doc/kernel-doc-<version>/Documentation/sysrq.txt
- nmi_watchdog: /usr/share/doc/kernel-doc-<version>/Documentation/nmi_watchdog.txt
Useful Links
-
Official Website: This content is not included.This content is not included.http://www.redhat.com/
-
Knowledge Base: https://access.redhat.com/site/
-
Product manuals: https://access.redhat.com/site/documentation/
-
Red Hat Network: This content is not included.This content is not included.https://access.redhat.com/management/
-
Subscription Information: https://www.redhat.com/support/
-
Production Support Service Level Agreement: This content is not included.This content is not included.https://access.redhat.com/support/offerings/production/sla.html
-
Production Support Scope of Coverage: https://access.redhat.com/support/offerings/production/soc
-
Severity Definition: https://access.redhat.com/support/policy/severity/
-
Life Cycle: This content is not included.This content is not included.https://access.redhat.com/support/policy/update_policies.html
-
Hardware Certification: This content is not included.This content is not included.http://hardware.redhat.com/
-
Compatible Software: This content is not included.This content is not included.https://www.redhat.com/wapps/isvcatalog/home.html
-
Product Comparison Chart: This content is not included.This content is not included.http://www.redhat.com/products/enterprise-linux/server/compare.html
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.