JBoss EAP mysterious shutdown when using the Tanuki service wrapper
Environment
- JBoss Enterprise Application Platform(EAP)
-
5.x
-
4.x
-
- JBoss Fuse 6.x
- JBoss A-MQ 6.x
- Fuse ESB Enterprise 7.0
- Java Service Wrapper (Tanuki)
- Windows
Issue
-
A JBoss EAP instance appears to be stopping for no apparent reason. It's a stable environment where there have been no changes for several months. Currently JBoss is run as a windows service through Tanuki.
-
JBoss EAP Server is restarted without any obvious reason.
-
I'm seeing the following error in the wrapper.log:
STATUS | wrapper | 2013/11/15 12:30:00 | --> Wrapper Started as Daemon
STATUS | wrapper | 2013/11/15 12:30:00 | Launching a JVM...
INFO | jvm 1 | 2013/11/15 12:30:02 | Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org
INFO | jvm 1 | 2013/11/15 12:30:02 | Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved.
INFO | jvm 1 | 2013/11/15 12:30:02 |
INFO | jvm 1 | 2013/11/15 12:30:04 | Nov 15, 2013 12:30:04 PM org.apache.karaf.main.SimpleFileLock lock
INFO | jvm 1 | 2013/11/15 12:30:04 | INFO: locking
INFO | jvm 1 | 2013/11/15 12:30:04 | Please wait while Fuse ESB is loading...
ERROR | wrapper | 2013/12/02 00:03:33 | JVM appears hung: Timed out waiting for signal from JVM.
ERROR | wrapper | 2013/12/02 00:03:33 | JVM did not exit on request, terminated
STATUS | wrapper | 2013/12/02 00:03:34 | JVM exited in response to signal SIGKILL (9).
ERROR | wrapper | 2013/12/02 00:03:34 | Unable to start a JVM
STATUS | wrapper | 2013/12/02 00:03:34 | <-- Wrapper Stopped
Resolution
- Address the root cause of the JVM unresponsiveness as detailed in Java application unresponsive
Root Cause
-
The Tanuki Windows service wrapper is terminating the JVM process because there was no response from it.
-
The wrapper polls the JVM every x seconds as defined by wrapper.ping.interval [1] and then waits for a response within the timeout period of wrapper.ping.timeout. The JVM doesn't respond within this timeout so it is terminated.
[1] Content from wrapper.tanukisoftware.org is not included.Content from wrapper.tanukisoftware.org is not included.http://wrapper.tanukisoftware.org/doc/english/properties.html
[2] Content from wrapper.tanukisoftware.org is not included.Content from wrapper.tanukisoftware.org is not included.http://wrapper.tanukisoftware.org/doc/english/example.html -
An underlying issue is causing the JVM/JBoss process to be unresponsive (likely from issues with garbage collection, high CPU, deadlocks, or other threading issues).
Diagnostic Steps
- Check the wrapper log file for the following messages
ERROR | wrapper | 2010/05/19 18:24:36 | JVM appears hung: Timed out waiting for signal from JVM. ERROR | wrapper | 2010/05/19 18:24:36 | JVM did not exit on request, terminated STATUS | wrapper | 2010/05/19 18:24:41 | Launching a JVM... INFO | jvm 3 | 2010/05/19 18:24:44 | WrapperManager: Initializing. - If it contains the above message, check the preceeding message for a pointer on what the problem might be. Should the preceeding message contain
Caused by: java.lang.OutOfMemoryError: Java heap spacethen you need to tune your JVM. See Java application "java.lang.OutOfMemoryError: Java heap space"
- If there are no obvious log messages, enable Garbage Collection(GC) logging if you haven't done so already. If in doubt, see article How do I enable Java garbage collection logging? for more information on how to enable it. Note that the JVM overwrites the gc.log when it is restarted so the file has to be manually backed up for the data to be preserved. This is difficult to do manually if Tanuki automatically restarts the process, thus overwriting the gc.log, soon after the unresponsiveness occurs.
- One method to prevent this would be to set the wrapper.ping.timeout to 0. This should disable the wrapper's auto restarting of JBoss when the ping interval is surpassed, preventing the gc.log from being overwritten. Only caveat to this is that if JBoss is going into a long term unresponsive state, then you will then have to manually restart it to correct this and restore functionality, but allowing it to persist in this state would allow you to capture much more diagnostic information such as thread dumps, cpu data, and/or heap dumps depending on the likely root cause.
- Another idea to preserve the gc.log in this scenario would be to set wrapper.restart.reload_configuration=TRUE. With this set to true, your wrapper will reload its configuration any time it restarts the process. This way you could set the name of the gc.log, start the process, and after it is started, change the name of the gc.log to something different (for instance, gc2.log). The next time the wrapper restarts the process, it should pick up the new gc log name and send output here rather than overwriting the old log.
- Investigate threading issues and other causes of unresponsiveness.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.