Java application down due to JVM crash

Solution Verified - Updated

Environment

  • Java
  • OpenJDK
  • Sun JDK
  • IBM JDK
  • JRockit

Issue

  • Java application server down due to JVM crash
  • JBoss down due to JVM crash
  • Java process crash in production with core dumps
  • JVM segfault
  • Jboss EAP instance got crashed on deploying the application. I have an application deployed in 3 jboss nodes and when it is used by users, my jboss instance crash and appears this hs_err_pid.log file with ERROR. Could you help me to know why it does not works or the cause of the error.
  • When the JVM crashes, hs_err_pidXXXX.log genereated.

Resolution

  • Involve the JVM vendor and/or JNI developers to troubleshoot the issue.
  • See the resolutions for the issues listed in the Root Cause section.
  • Involve the instrumentation agent vendor to see if there are any settings or configuration to avoid the issue.

Root Cause

A segfault in the JVM can be caused by a number of reasons and must be diagnosed correctly to identify the root cause. Listed below are a number of the commonly diagnosed problems with a root cause of the JVM crashing.

Diagnostic Steps

General Recommendations

  • Verify that the JVM really crashed. Did the JVM exit and create a fatal error log or javacore? Or did the application freeze or was unresponsive? Verify that the Java process is no longer alive (e.g. using jps -lv, ps. etc.). If the JVM process is still alive, see Java application unresponsive.

  • Install the java debugging symbols. This will provide more detailed information in the fatal error log.

  • Has the application been running in production for a long time without issue, or has it been recently deployed? If running for a long time in production, something must have changed recently to cause the issue (e.g. JVM upgrade, application upgrade, etc.).

  • When does the issue happen? For example, does it happen on startup, after a certain amount of uptime, when a specific use case is executed, etc.?

  • Check the JVM options listed in the fatal error log or javacore to see if any instrumentation agents are being used. For example:

      -javaagent:/opt/jboss/wily/Agent.jar
    
  • Check the JVM options in the fatal error log or javacore to see if any debugging instrumentation is enabled. For example:

      -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n
    

    or

      -agentlib:jdwp=transport=dt_socket,suspend=y,address=localhost:4770
    
  • Test with any instrumentation agents and/or debugging instrumentation disabled.

  • Test with the latest JVM release.

  • Does the application use JNI code? The fatal error could be caused by an exception thrown in the JNI code (e.g. UnsatisfiedLinkError). In that case the fatal error log may not provide any clues to the real issue. Turn up logging on any Java classes calling JNI code, and check the application log (e.g. JBoss server.log) for exceptions.

  • Add the -Xcheck:jni JVM option to help find issues with JNI code. The output goes to standard out, so standard out will need to be logged.

  • If the issue is in native code, get a backtrace from a debugger when the issue happens. How to do this depends on the JVM and the OS. See the JVM sections below for details.

  • To get a cleaner backtrace add the -Xint JVM option to disable the JIT compiler, as JIT code and code for which no debug symbols exist will not be defined. For example, this is how it looks in a gdb backtrace:

          #4  0xb44274aa in ?? ()
    
  • Does the issue happen on every deployed instance, or just some? If it happens on one deployment but not another, investigate environment differences (e.g. JVM version, OS version,  etc.). For example, compare environment information in the JBoss boot.log or server.log.

  • Does the crash happen only with one Java application? Is the stack at the time of the crash consistent? If not, investigate hardware and/or virtualization (e.g. VMWare) issues.

OpenJDK / Sun JDK

  • Check if a fatal error log was created. If you set -XX:ErrorFile=... jvm option (like -XX:ErrorFile=/var/log/java/java_error%p.log), the fatal error log will be generated in the specified path. If the option is not specified, the fatal error log will be generated as the file named hs_err_pid<pid>.log (where <pid> is the JBoss java process id) under the process current working directory. The process current working directory can be checked from the user.dir environment variable in boot.log or server.log or executing the command "ls -l /proc/<pid>/cwd" or "lsof -p <pid> | grep cwd" (typically $JBOSS_HOME/bin/). In addition, the fatal error log header will be sent to standard out.

  • If a fatal error log is expected but not there, the following are known reasons why one would not be created:

  • Analyze the fatal error log to understand where the issue is originating from (e.g. native code, Java bytecode, etc.), JVM version, OS, hardware (e.g. memory, cpu), heap usage, etc.

  • Search the Content from bugs.openjdk.java.net is not included.OpenJDK Bug System for known issues.

  • Many issues can be identified by analyzing the fatal error log without resorting to core analysis. If fatal error log analysis is inconclusive, do core analysis. Ensure the core is created with the proper debug symbols installed. If possible, do the analysis on the box where the core was created to avoid issues related to environmental differences. The core is attached to, and a backtrace obtained to see what the JVM was doing when the crash occurred.

  • Analyze the gc logging leading up to the crash and look for clues on JVM health when the issue happened (e.g. under stress due trying to free heap or perm/metaspace, evidence of cpu starvation, using non-standard gc/JVM options).

  • To get a backtrace on Linux, use gcore to create a core dump when the issue happens so it can be analyzed offline. gcore comes with the gdb package. Add the -XX:OnError="gcore -o /desired/path/for/core %p" JVM option.
    On early version of JBoss, the -XX:OnError option appears to only work if it is used right after the java command. For example, in EAP 5 it needs to be added to run.sh, not run.conf. There is this block of code at the end of run.sh:

    while true; do
           if [ "x$LAUNCH_JBOSS_IN_BACKGROUND" = "x" ]; then
              # Execute the JVM in the foreground
              eval \"$JAVA\" $JAVA_OPTS \
                 -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" \
                 -classpath "$JBOSS_CLASSPATH" \
                 org.jboss.Main "$@" >> jboss.trace
              JBOSS_STATUS=$?
           else
              # Execute the JVM in the background
              "$JAVA" $JAVA_OPTS \
    

    Update it and add the -XX:OnError option in the foreground or background block (or both). For example:

    while true; do
           if [ "x$LAUNCH_JBOSS_IN_BACKGROUND" = "x" ]; then
              # Execute the JVM in the foreground
              eval \"$JAVA\" -XX:OnError="\"gcore %p\"" $JAVA_OPTS \
                 -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" \
                 -classpath "$JBOSS_CLASSPATH" \
                 org.jboss.Main "$@" >> jboss.trace
              JBOSS_STATUS=$?
           else
              # Execute the JVM in the background
              eval \"$JAVA\" -XX:OnError="\"gcore %p\"" $JAVA_OPTS \
    

    Note that in earlier versions of JBoss eval did not precede the $JAVA command. As such the following block should be used:

    while true; do
           if [ "x$LAUNCH_JBOSS_IN_BACKGROUND" = "x" ]; then
              # Execute the JVM in the foreground
              "$JAVA" -XX:OnError="gcore %p" $JAVA_OPTS \
                 -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" \
                 -classpath "$JBOSS_CLASSPATH" \
                 org.jboss.Main "$@" >> jboss.trace
              JBOSS_STATUS=$?
           else
              # Execute the JVM in the background
              "$JAVA" -XX:OnError="gcore %p" $JAVA_OPTS \
    

    For domain mode on EAP 6, set OnError as an option in your server's jvm element:

    <option value="-XX:OnError=gcore -o /desired/path/for/corefile %p"/>
    

    Verify the option was applied by checking the process line for JBoss through ps command or jps -mlv command. The core dump will be created under the process' current working directory. The process current working directory can be checked from the user.dir environment variable in boot.log or server.log or executing the command "ls -l /proc/<pid>/cwd" or "lsof -p <pid> | grep cwd" (typically $JBOSS_HOME/bin/). The gcore -o option can be used to specify the core path and file name. For example: gcore -o /home/jboss/core %p.

  • If you want to test that the OnError flag is working, then you would need to force a fatal error to make the JVM crash. You could try the attached app for that purpose, which should likely crash on most SunJDK releases. Deploy the app and then access the /crashtestdummy/thismaycrash.jsp page to try to crash. Be sure that is done in a test environment only that you are prepared to let crash.

  • Attach to the gcore on the box where it was created to avoid environmental issues and get the best backtrace possible (be sure the path to java is the same java used to start JBoss). For example:

    gdb /usr/bin/java core.14419
    

RHEL Recommendations

**NOTE:** If running OpenJDK, on Red Hat Enterprise Linux (RHEL), install the java `debuginfo` packages before getting the backtrace. This will provide a more detailed stack for native code. The following command will install the java debuginfo packages (`debuginfo-install` is provided by the `yum-utils` package):

    RHEL6:
    $ debuginfo-install java-(1.6.0|1.7.0|1.8.0)-openjdk

    RHEL7:
    $ debuginfo-install java-(1.6.0|1.7.0|1.8.0|11)-openjdk

    RHEL8:
    $ debuginfo-install java-(1.8.0|11)-openjdk
    $ debuginfo-install java-(1.8.0|11)-openjdk-headless

The gdb debugger will be launched, and you should see the gdb debugger command line. Issue the following commands from the gdb debugger command line to get a backtrace of all threads logged to gdb.txt in the directory where gdb was started:

    (gdb) set height 0
    (gdb) set logging on 
    (gdb) thread apply all bt
    (gdb) set logging off
    (gdb) quit

The last thread in gdb.txt will be the thread running when the crash happened.

  • If the backtrace is inconclusive, use jstack to attach to the core and get a thread dump, and cross reference the thread id from the core backtrace with the thread id in the thread dump. For example, use one of the command below, depending on architecture and desired thread dump format:

          JDK6 - JDK8:
          32-bit JVM - mixed mode (both Java and native C/C++ frames):
          jstack -m /usr/bin/java core.pid > jstack.out 2>&1
          
          32-bit JVM - long listing (with additional lock information):
          jstack -l /usr/bin/java core.pid > jstack.out 2>&1
          
          64-bit JVM  - mixed mode (both Java and native C/C++ frames):
          jstack -J-d64 -m /usr/bin/java core.pid > jstack.out 2>&1
          
          64-bit JVM  - long listing (with additional lock information):            
          jstack -J-d64 -l /usr/bin/java core.pid > jstack.out 2>&1
    
          JDK11:
          Long listing (with additional lock information):       
          jhsdb jstack --locks --core core.pid --exe /usr/bin/java  > jstack.out 2>&1
    
          Mixed mode (both Java and native C/C++ frames):
          jhsdb jstack --mixed --core core.pid --exe /usr/bin/java > jstack.out 2>&1
    
  • The /usr/bin/java must be the same version that created the core, or you will get DebuggerException: Can't attach to the core file.

  • If jstack runs a long time or seeming without end, this would suggest that the JVM crashed due to a StackOverFlowError/Segmentation Fault stemming from very deep or endless recursion. See JBoss crashes with 'Segmentation fault'

  • Get a sosreport.

  • Check the Manufacturer and Product Name from sosreport/dmidecode to see if it's certified: This content is not included.This content is not included.https://catalog.redhat.com/hardware. If the issue is caused by the unsupported hardware, Red Hat reserves the issue may need to be reproduced on certified hardware or referred to the hardware vendor. Support Policy for Uncertified Systems and Configurations.

  • Test memory by running memtest86+ or Content from shipilev.net is not included.memtester.

IBM Recommendations

  • Run the following command and check the various dump functions and locations in the output for dump files like javacore, heap dump, etc.:

  •   java -Xdump:what -version
      Registered dump agents
      ----------------------
      dumpFn=doSystemDump
      events=gpf+abort
      filter=
      label=/home/jboss/bin/core.%Y%m%d.%H%M%S.%pid.%seq.dmp
      range=1..0
      priority=999
      request=serial
      opts=
      ----------------------
      dumpFn=doSnapDump
      events=gpf+abort
      filter=
      label=/home/jboss/bin/Snap.%Y%m%d.%H%M%S.%pid.%seq.trc
      range=1..0
      priority=500
      request=serial
      opts=
      ----------------------
      dumpFn=doSnapDump
      events=systhrow
      filter=java/lang/OutOfMemoryError
      label=/home/jboss/bin/Snap.%Y%m%d.%H%M%S.%pid.%seq.trc
      range=1..4
      priority=500
      request=serial
      opts=
      ----------------------
      dumpFn=doHeapDump
      events=systhrow
      filter=java/lang/OutOfMemoryError
      label=/home/jboss/bin/heapdump.%Y%m%d.%H%M%S.%pid.%seq.phd
      range=1..4
      priority=40
      request=exclusive+compact+prepwalk
      opts=PHD
      ----------------------
      dumpFn=doJavaDump
      events=gpf+user+abort
      filter=
      label=/home/jboss/bin/javacore.%Y%m%d.%H%M%S.%pid.%seq.txt
      range=1..0
      priority=10
      request=exclusive
      opts=
      ----------------------
      dumpFn=doJavaDump
      events=systhrow
      filter=java/lang/OutOfMemoryError
      label=/home/jboss/bin/javacore.%Y%m%d.%H%M%S.%pid.%seq.txt
      range=1..4
      priority=10
      request=exclusive
      opts=
      ----------------------
      
      java version "1.5.0"
      Java(TM) 2 Runtime Environment, Standard Edition (build pxa64dev-20090707 (SR10 ))
      IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux amd64-64 j9vmxa6423-20090707 (JIT enabled)
      J9VM - 20090706_38445_LHdSMr
      JIT  - 20090623_1334_r8
      GC   - 200906_09)
      JCL  - 20090705
    
  • Review any files created by the dump agents.

  • A javacore is equivalent to a OpenJDK or Sun JDK thread dump with additional garbage collection and environment information. It can be analyzed with the IBM Thread and Monitor Dump Analyzer tool: <Content from www.alphaworks.ibm.com is not included.http://www.alphaworks.ibm.com/tech/jca>.

  • Snap trace analysis requires using the IBM Trace Formatter: <Content from publib.boulder.ibm.com is not included.http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=/com.ibm.java.doc.diagnostics.50/diag/tools/trace_formatter.html>. For example:

    java -Xtrace:format=/path/to/ibm/jre/lib com.ibm.jvm.format.TraceFormat Snap.%Y%m%d.%H%M%S.%pid.%seq.trc 
    
    • /path/to/ibm/lib is the directory where the TraceFormat.dat and J9TraceFormat.dat (Java 8) formatting files are located. (e.g. /etc/alternatives/jre_1.8.0_ibm/lib/ on Red Hat Enterprise Linux (RHEL) 8).
    • The output file (e.g. Snap.%Y%m%d.%H%M%S.%pid.%seq.trc.fmt) will contain the formatted data.
  • System dump cores can be analyzed with jextract or by attaching to them with a debugger (e.g. dbx). jextract is preferred because it has the advantage of understanding Java frames and JVM control blocks.

  • Use jextract on the system dump core as follows:

          JAVA_HOME/jre/bin/jextract <core.dump>
    
  • jextract sends its output to a file called dumpfilename.xml that contains JVM internal information useful for troubleshooting.

  • jextract location on RHEL 5: /usr/lib/jvm/jre-1.6.0-ibm.x86_64/bin/jextract

  • It is best to run jextract on the same system that produced the dump, but it can be run on a different system with the same JRE version.

  • Attach to an AIX system dump core using dbx on the box where it was created to avoid environment issues and get the best backtrace possible (be sure the path to java is the same java used to start JBoss). For example:

          dbx /path/to/java core.%Y%m%d.%H%M%S.%pid.%seq.dmp
    
  • The dbx debugger will be launched, and you should see the dbx debugger command line. Issue the following command from the dbx debugger command line to get a backtrace and copy and past the console window output into the case or into a file and attach to the case.

          # (dbx) where
    

Linux Recommendations

  • To trace any signals that the JVM may be receiving, run

    strace -f -q -e trace=none java ...
    

    If it is receiving an unexpected signal like SIGTERM or SIGKILL, you can use This content is not included.Systemtap with Content from sourceware.org is not included.Sigmon to locate the source.

  • An example systemtap script, parenttrace.stp is attached to the article; to use it ensure that the appropriate kernel-debuginfo package is installed on the server, and then run the following until the issue occurs:

    stap <script> | xargs -l 1 logger -p kern.warn
    
  • This should help pinpoint where the signal is coming from with a line similar to the following:

    SIGKILL sent by problematic_script[27692] who was forked from <- java[23863] <- java[666] <- domain.sh[8100] <- init[1] <- swapper[0]
    
  • Start the JVM with MALLOC_CHECK_=1 to enable memory checking. It can provide insight into some memory issues that cause heap corruption. To enable this, find this block of code in run.sh:

          while true; do
                   if [ "x$LAUNCH_JBOSS_IN_BACKGROUND" = "x" ]; then
                      # Execute the JVM in the foreground
                      "$JAVA" $JAVA_OPTS \
                         -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" \
                         -classpath "$JBOSS_CLASSPATH" \
                         org.jboss.Main "$@" >> jboss.trace
                      JBOSS_STATUS=$?
                   else
                      # Execute the JVM in the background
                      "$JAVA" $JAVA_OPTS \
    

    Add MALLOC_CHECK_=1 in front of the java command as follows:

          while true; do
                   if [ "x$LAUNCH_JBOSS_IN_BACKGROUND" = "x" ]; then
                      # Execute the JVM in the foreground
                      MALLOC_CHECK_=1 "$JAVA" $JAVA_OPTS \
                         -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" \
                         -classpath "$JBOSS_CLASSPATH" \
                         org.jboss.Main "$@" >> jboss.trace
                      JBOSS_STATUS=$?
                   else
                      # Execute the JVM in the background
                      MALLOC_CHECK_=1 "$JAVA" $JAVA_OPTS \
    

JRockit Recommendations

  • Check if a fatal error log was created. The fatal error log will be in  the directory defined by the user.dir environment  variable in boot.log or server.log (typically $JBOSS_HOME/bin/) and jrockit.dump. If one is  expected but not there, check that the user has write access to that  directory.
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.