A Guide for Troubleshooting a Segfault
What is a segfault?
A segmentation fault (also known as a segfault or a segmentation violation) occurs when a process attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system). On Unix family operating systems, a signal called SIGSEGV - signal #11, defined in the system header file signal.h - is then sent to to process. The default action for SIGSEGV is abnormal termination: the process ends and an application core file may be written (depending on the system's configuration).
On some architectures (notably x86_64), the kernel logs a message to the kernel ring buffer when a segfault is generated.
Why does a segfault occur?
A segmentation fault can occur under the following circumstances:
1. A bug (software defect) in the program or command is encountered, for example a buffer overflow (an attempt to access memory beyond the end of an array). This can typically be resolved by applying errata or vendor software updates.
2. A hardware problem affects the virtual memory subsystem. For example, a RAM DIMM or CPU cache is defective.
3. An attempt is made to execute a program that was not compiled/built correctly.
What does a segfault signify?
A segfault typically just signifies an error in one particular process/program. It does not signify an error of the kernel.
The kernel just detects the error of the program and (on some architectures) prints the information, like process name and PID (in this example, the process name is fmg and PID is 6335), to the log:
Nov 27 15:26:19 machine kernel: fmg[6335]: segfault at 00000000ffffd2dc rip 00000000ffffd2dc rsp 00000000ffffd1bc error 15
If multiple unrelated processes are seen to segfault, then it is likely that a hardware issue is affecting the virtual memory subsystem. Refer to How to check if system RAM is faulty in Red Hat Enterprise Linux? for suggestions in this case.
What does the kernel message mean, in detail?
- The
ripvalue is the instruction pointer register value, therspis the stack pointer register value. - The
errorvalue is a bit mask of page fault error code bits (from This content is not included.arch/x86/mm/fault.c):
* bit 0 == 0: no page found 1: protection fault
* bit 1 == 0: read access 1: write access
* bit 2 == 0: kernel-mode access 1: user-mode access
* bit 3 == 1: use of reserved bit detected
* bit 4 == 1: fault was an instruction fetch
- Here's error bit definition:
enum x86_pf_error_code {
PF_PROT = 1 << 0,
PF_WRITE = 1 << 1,
PF_USER = 1 << 2,
PF_RSVD = 1 << 3,
PF_INSTR = 1 << 4,
};
- This can be extracted to decimal number as follows:
$ echo $((1 << 0))
1
$ echo $((1 << 1))
2
$ echo $((1 << 2))
4
$ echo $((1 << 3))
8
$ echo $((1 << 4))
16
- Convert decimal to bit as follows:
$ echo 'ibase=10;obase=2; 1' |bc
1
$ echo 'ibase=10;obase=2; 2' |bc
10
$ echo 'ibase=10;obase=2; 4' |bc
100
$ echo 'ibase=10;obase=2; 8' |bc
1000
$ echo 'ibase=10;obase=2; 16' |bc
10000
- In this case we can see 'error 15'.
Nov 27 15:26:19 machine kernel: fmg[6335]: segfault at 00000000ffffd2dc rip 00000000ffffd2dc rsp 00000000ffffd1bc error 15
- Thus error is 1111 in bit.
$ echo 'ibase=10;obase=2; 15' |bc
1111
- Finally we can know the meaning of 1111 as follows:
01111
^^^^^
||||+---> bit 0
|||+----> bit 1
||+-----> bit 2
|+------> bit 3
+-------> bit 4
What is required for analysis of a segfault ?
For in-depth analysis of a segfault, an application core dump will typically be needed. Refer to the following articles for guidance on how to capture an application core dump:
- How to collect core dump file of a crashing program that is shipped in Red Hat Enterprise Linux 6 and above ?
- How do I enable core file dumps when my application crashes or segmentation faults? (for older releases)
Analysis requires a debugger program such as gdb. Using these tools is a specialist operation usually best done by the application developer, and can require the compilation of the application with extra information contained specifically for the debugger program.
What to do when the application segfaulting is a third party application?
If the application is provided by a third party, please engage the vendor to begin troubleshooting the issue and escalate to Red Hat Support if required.
Notes
The kernel message indicating an application segfault was added for the x86_64 architecture in RHEL5. In RHEL6 it is present for both 32bit and 64bit x86 architectures.