How to create a basic centralized crash analysis system to analyze vmcore locally
Environment
- Red Hat Enterprise Linux 5
- Red Hat Enterprise Linux 6
- Red Hat Enterprise Linux 7
Issue
- Transferring large vmcore files can be extremely time consuming or not realistically possible in many environments. In some environments it is not practical or possible to install the kernel-debuginfo and crash utility packages locally on affected systems, but a centralize system can be used for crash analysis.
Resolution
-
If the correct kernel files are available on the centralized system it should be able to analyze vmcores for any versions of RHEL. Alternately, individual remote systems can be configured for crash analysis if an organization has multiple remote locations.
-
The following steps can be used for basic crash analysis on a centralized system.
1. kdump must to be properly configured and tested on all systems to generate a vmcore file for analysis.
How do I configure kdump
This content is not included.Understanding how kdump works
Additional information on configuring the crashkernel parameter on RHEL7
2. The crash package must be installed on the centralized system that will be analyzing the core files.
# yum install crash
3. A working directory structure is recommended for storing the necessary files - this should include directories for kernel files, scripts, temporary cores and output files at minimum. These files will be rather large so proper space needs to be allocated accordingly. An example would be:
/cas/cores
/cas/kernels
/cas/output
/cas/scripts
4. Kernel specific vmlinux files matching the systems where the core was generated must be available on the centralize system for crash analysis. This can be accomplished manually or scripted depending on the number of different kernels needed. It may be necessary to manually create disabled rhel-*-server-debug repos for other versions of RHEL if using the example method below.
a. Download the kernel specific debuginfo RPM
[root@rhel6 kernels]# yumdownloader --disablerepo=\* --enablerepo=rhel-5-server-debug-rpms kernel-debuginfo-2.6.18-371.9.1.el5.x86_64
b. Extract the vmlinux file from the RPM using rpm2cio
[root@rhel6 kernels]# rpm2cpio kernel-debuginfo-2.6.18-371.9.1.el5.x86_64.rpm | cpio -idv './usr/lib/debug/lib/modules/*/vmlinux'
c. If specifying the kernel version in the cpio path, the version inside the RPM may be $kernelversion.$arch or just $kernelversion depending. The following command can be used to find the specific path if needed.
root@rhel6 kernels]# rpm2cpio kernel-debuginfo-2.6.18-371.9.1.el5.x86_64.rpm | cpio -idvt | grep vmlinux
d. The RPM can be deleted to save space if needed.
5. An input file can be very helpful when using the crash utility. Here is an example of a file that would need to be modified to match local directory structure and use:
[root@rhel6 cores]# cat /cas/scripts/crash-input.txt
!mkdir /cas/output/tmp 2>/dev/null
sys > /cas/output/tmp/sys
bt > /cas/output/tmp/bt
bt -a > /cas/output/tmp/bt-a
ps > /cas/output/tmp/ps
runq > /cas/output/tmp/runq
log > /cas/output/tmp/log
kmem -i > /cas/output/tmp/kmem-i
kmem -f > /cas/output/tmp/kmem-f
mod > /cas/output/tmp/mod
swap > /cas/output/tmp/swap
mount > /cas/output/tmp/mount
!tar zcf /cas/output/crash-analysis-$(date '+%Y%m%d_%H%M%S').tar.gz /cas/output/tmp 2>/dev/null
!echo -e "#"
!echo -e "# Please attach the generated '/cas/output/crash-analysis-DATE.tar.gz'"
!echo -e "# and a sosreport from system that crashed to a Red Hat support case at"
!echo -e "# https://access.redhat.com/support/cases/"
!echo -e "#"
!echo -e "# Once files are uploaded please provide a case comment containing any additional"
!echo -e "# details about the issue as well as the following:"
!echo -e " Please find attached crash analysis archive (which contains basic information"
!echo -e " from the vmcore) along with the sosreport from the effected system.\n"
!echo -e " Bandwidth is limited and transfer of the complete vmcore may take"
!echo -e " prohibitively long. Please let us know what other information we can provide.\n"
quit
6. Finally, the crash utility can be run using the input file, extracted vmlinux file and vmcore from the remote system to generate a set of analysis files for upload to Red Hat. Crash can also be run manually to provide specific results or Red Hat staff can gather additional information over a remote session.
[root@rhel6 cores]# crash -s -i /cas/scripts/crash-input.txt /cas/kernels/usr/lib/debug/lib/modules/2.6.18-371.9.1.el5/vmlinux /cas/cores/vmcore-rhel5
-
For more information on the crash utility:
-
Depending on frequency of use, this process can be scripted to better handle vmlinux file download and extraction, as well as, labeling, storage, and clean up files.
NOTE: With RHEL 6.6 most of the steps above can be replaced with redhat-support-tool. Creating an input file and storage directory becomes optional depending on use case. There is currently a bug in redhat-support-tool that does not allow this to function properly when analyzing vmcores from different RHEL versions.
The btextract functionality built into redhat-support-tool can detect and install necessary kernel-debuginfo files and take additional options. Here is an example running redhat-support-tool btextract non-interactively from the CLI using an input file to obtain the same results as above.
[root@rhel6 output]# redhat-support-tool btextract -i /path/to/crash-input.txt
- For more information on the redhat-support-tool: Red Hat Access: Red Hat Support Tool
The current redhat-support-tool btextract usage:
[root@rhel6 output]# redhat-support-tool btextract --help
Usage: btextract [options] </path/to/vmcore>
Use the 'btextract' command get a kernel stack backtrace and other related information from a
kernel core dump file. The default behavior is to issue 'bt -a'; however, there are a variety of
other 'crash' commands that can be run.
Options:
-h, --help show this help message and exit
-c CASENUMBER, --case=CASENUMBER
Add the collected data as a comment to the provided
case.
-a, --all Run all options. Equals -aeflpFi
-e, --exframe Search the stack for possible kernel and user mode
exception frames (ie. bt -e).
-f, --foreachbt Display the stack traces for all tasks (ie. foreach
bt).
-l, --log Dumps the kernel log_buf contents in chronological
order.
-p, --ps Displays process status for selected processes in the
system.
-F, --files Displays information about open files.
-i CMDFILE, --cmdfile=CMDFILE
Run a sequence of individual 'crash' commands from a
file.
Examples:
- btextract /var/crash/vmcore
- Once crash analysis server is configured refer following article to collect some vmcore pre-analysis information - click here
Root Cause
- Bandwidth not sufficient for transferring vmcore files in a timely manner.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.