Java application crashes with 'Segmentation fault'

Solution Verified - Updated

Environment

  • Red Hat JBoss Enterprise Application Platform (EAP)
  • Linux
  • CentOS
  • Red Hat Enterprise Linux (RHEL)
  • Sun/Oracle JDK

Issue

  • JBoss crashes with this error in the console output:

          run.sh: line 283: 19837 Segmentation fault      "$JAVA" $JAVA_OPTS -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" -classpath "$JBOSS_CLASSPATH" org.jboss.Main "$@"
    
  • We have serious problems since we migrated our production environment from EAP 4.3.0 to EAP 5.0.0 last night. One or two of our cluster-nodes are dying with following message in console.log:

          run.sh: line 283:  9076 
          Speicherzugriffsfehler 
           "$JAVA" $JAVA_OPTS -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" 
          -classpath "$JBOSS_CLASSPATH" org.jboss.Main "$@"
    
  • The following error in the Linux /var/log/messages file:

          kernel: java[17601]: segfault at 00000000498a2ca8 rip 00002aac2144459d rsp 00000000498a2c90 error 6
    
  • After one node in the cluster crashes, the other nodes crash in quick succession.

Resolution

  • See java.lang.StackOverflowError.

  • In some cases, this segmentation fault from StackOverflowErrors has been avoided by setting a stack size of 5120k or higher.  With these larger stacks, the actual error/stacktrace was printed instead of crashing with a seg fault.

  • Increase the StackShadowPages JVM setting so that more stack space is reserved for native code:

      -XX:StackShadowPages=20
    

    Note that will leave less stack for java level code and so the entire thread stack (-Xss) may need to be increased so that the amount of stack available to java level code is not decreased as a consequence.

Root Cause

  • The Java thread stack size is being exceeded, and the JVM crashes instead of throwing java.lang.StackOverflowError. This behavior seems to be specific to the Sun JDK on Linux, as with OpenJDK on Linux and Sun JDK on Windows the JVM does not crash and java.lang.StackOverflowError is thrown.
  • When all nodes in a cluster crash one after another, it is because the issue is request related (e.g. a use case is executed that results in deep recursion or an endless loop), and failover is propagating the request to other nodes. The request first brings down one node, then failover happens and the next node is brought down, and so on until all nodes are brought down.
  • See java.lang.StackOverflowError.
  • Java code has to share the stack with native code such as socketWrite. A portion of that stack will be reserved specifically for native code per the JVM's StackShadowPages setting. If StackShadowPages is too small, VM/native code calls could end up crashing with a StackOverflow when the StackShadowPages space is all that is left to it after java level recursion uses the rest. If the StackOverflow occurs in the native layer call, then the JVM crashes (potentially without an hs_err) instead of providing a java level StackOverflow exception.
  • Java crashes in SocketOutputStream.socketWrite0 from libnet.so
  • [JVM Bug] Content from bugs.sun.com is not included.JDK-7059899 : Stack overflows in Java code cause 64-bit JVMs to exit due to SIGSEGV

Diagnostic Steps

  1. Verify a fatal error log was not created.
  2. Get a core dump when the issue happens and analyze the backtrace and/or jstack output to see what the JVM was doing at the time of the crash. See Java application down due to JVM crash.
    • If core dump is not getting created, check the output from ulimit -c is not 0. If it is, then you cannot create core files. To enable creation of core files you can do one of the following:
      • Run command ulimit -c unlimited from the terminal - This will be created for the specific user session/terminal and is not persisted on a server reboot
      • Configure core size in /etc/security/limits.conf - Consult the system administrator about how to set this for the user running the Java application.
      • If jstack runs on the core dump a long time or seemingly without end, this further suggests that the JVM crashed due to a StackOverFlowError stemming from very deep or endless recursion.
      • If jstack returns a InvocationTargetException or VMVersionMismatchException confirm the version jstack is being used using strace command, which should met the version of the application. As in strace jstack.
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.