JBoss EAP 6 managed servers down after losing contact with host controller
Environment
- Red Hat JBoss Enterprise Application Platform (EAP)
- 6.x
Issue
- The JBoss server node is going down but host controller and process controller is running. Due to this we are restarting the entire JBoss server on daily basis, we need help in resolving this issue.
- JBoss Domain mode run on VMware VM, server shutdowns without any shutdown operations. There is no
hs_err_pid*.logfile. - We have a domain with 1 appserver running, and the appserver has died. It was not processing any requests at the time. It seems to have lost connection to the Host Controller. Did it die because of that? If so, can we prevent that?
- JBOSS server stops with JBAS012175.
- Applications with 'IO exceptions' alarms.
- JBOSS server stops without any reason, looks like normal shutdown but with error :
14:29:13,363 ERROR [stderr] (main) java.io.IOException: JBAS012175: Channel closed
14:29:13,363 ERROR [stderr] (main) at org.jboss.as.server.mgmt.domain.HostControllerConnection.getChannel(HostControllerConnection.java:100)
14:29:13,364 ERROR [stderr] (main) at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:115)
14:29:13,364 ERROR [stderr] (main) at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:98)
14:29:13,364 ERROR [stderr] (main) at org.jboss.as.server.mgmt.domain.HostControllerConnection.reConnect(HostControllerConnection.java:168)
14:29:13,364 ERROR [stderr] (main) at org.jboss.as.server.mgmt.domain.HostControllerClient.reconnect(HostControllerClient.java:98)
14:29:13,365 ERROR [stderr] (main) at org.jboss.as.server.DomainServerMain.main(DomainServerMain.java:138)
14:29:13,365 ERROR [stderr] (main) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
14:29:13,365 ERROR [stderr] (main) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
14:29:13,365 ERROR [stderr] (main) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
14:29:13,365 ERROR [stderr] (main) at java.lang.reflect.Method.invoke(Method.java:606)
14:29:13,366 ERROR [stderr] (main) at org.jboss.modules.Module.run(Module.java:270)
14:29:13,366 ERROR [stderr] (main) at org.jboss.modules.Main.main(Main.java:411)
- Following kind of log pattern is observed :
[Host Controller] 02:18:06,893 ERROR [org.jboss.as.controller.management-operation] (HttpManagementService-threads - 858) JBAS014612: Operation ("read-resource") failed - address: ([
[Host Controller] {"host" => "testhost"},
[Host Controller] {"server" => "server-three"},
[Host Controller] {"subsystem" => "datasources"},
[Host Controller] {"data-source" => "ExampleDS"}
[Host Controller] ]): java.lang.RuntimeException: java.io.IOException: JBAS012175: Channel closed
.
.
[Server:server-three] [31m02:20:17,897 ERROR [stderr] (main) java.net.ConnectException: JBAS012174: Could not connect to remote://10.10.10.10:9999. The connection failed
.
.
[Host Controller] 02:19:26,241 ERROR [org.jboss.remoting.remote.connection] (Remoting "example.com:MANAGEMENT" read-1) JBREM000200: Remote connection failed: java.io.IOException: JBREM000201: Received invalid message on Remoting connection 4b050ff7 to /10.10.10.10:52239
.
.
[Server:server-three] [0m02:20:52,042 INFO [org.jboss.as] (MSC service thread 1-5) JBAS015950: JBoss EAP 6.2.0.GA (AS 7.3.0.Final-redhat-14) stopped in 27001ms
.
.
[Host Controller] 02:21:06,250 INFO [org.jboss.as.host.controller] (ProcessControllerConnection-thread - 2) JBAS010926: Unregistering server server-three
.
.
[Host Controller] 10:02:45,164 INFO [org.jboss.as.host.controller] (Host Controller Service Threads - 49) JBAS010923: Stopping server server-three
[Host Controller] 10:02:45,212 WARN [org.jboss.as.domain] (MSC service thread 1-5) JBAS010929: Connection to remote host "slaveHost" closed unexpectedly
.
.
10:02:52,764 INFO [org.jboss.as.process.Host Controller.status] (Shutdown thread) JBAS012018: Stopping process 'Host Controller'
[Host Controller] 10:02:52,830 INFO [org.jboss.as] (MSC service thread 1-8) JBAS015950: JBoss EAP 6.2.0.GA (AS 7.3.0.Final-redhat-14) stopped in 58ms
10:02:52,865 INFO [org.jboss.as.process.Host Controller.status] (reaper for Host Controller) JBAS012010: Process 'Host Controller' finished with an exit status of 0
Resolution
-
Upgrade to EAP 6.3 CP3 (or better) where this This content is not included.Bug 1140453 is fixed.
-
As a work-around, determine the cause of the un-resonsiveness and solve that trigger. Common fixes are:
- Avoid too heavy CPU resource shortage so that a host controller and a managed server can communicate well each other.
- Tune your JVM to avoid long GC pause.
- Allocate more resources to your virtual host if you are using virtualization.
- If using JDK 1.6, upgrade Java version
Root Cause
Known This content is not included.Bug 1106393
There is a local connection between a host controller and a managed server. The host controller or server instances being unresponsive (such as a long GC pause) can cause a read timeout on the connection, which leads to the server attempting to reconnect. Currently if it fails to reconnect, the server will shut itself down.
Diagnostic Steps
- In
$JBOSS_HOME/domain/log/host-controller.log, you can find something happened to a managed server.
14:35:48,001 ERROR [org.jboss.remoting.remote.connection] (Remoting "host01:MANAGEMENT" read-1) JBREM000200: Remote connection failed: org.xnio.channels.ReadTimeoutException: Read timed out
14:38:27,876 INFO [org.jboss.as.host.controller] (ProcessControllerConnection-thread - 2) JBAS010926: Unregistering server host01-Server1
- In
$JBOSS_HOME/domain/log/process-controller.log, one can find managed server exit log.
14:35:54,608 INFO [org.jboss.as.process.Server:Server-one.status] (reaper for Server:jo007006) JBAS012010: Process 'Server:jo007006' finished with an exit status of 0
- In the server's
$JBOSS_HOME/domain/servers/$SERVER/log/server.log, one can find the following stack trace in thestderr.
14:35:53,579 ERROR [stderr] (main) java.net.ConnectException: JBAS012144: Could not connect to remote://host01:9999. The connection timed out
14:35:53,690 ERROR [stderr] (main) at org.jboss.as.protocol.ProtocolConnectionUtils.connectSync(ProtocolConnectionUtils.java:131)
14:35:53,690 ERROR [stderr] (main) at org.jboss.as.server.mgmt.domain.HostControllerConnection$ReconnectTask.connect(HostControllerConnection.java:312)
14:35:53,690 ERROR [stderr] (main) at org.jboss.as.protocol.ProtocolConnectionManager.connect(ProtocolConnectionManager.java:70)
14:35:53,690 ERROR [stderr] (main) at org.jboss.as.server.mgmt.domain.HostControllerConnection.reConnect(HostControllerConnection.java:165)
14:35:53,690 ERROR [stderr] (main) at org.jboss.as.server.mgmt.domain.HostControllerClient.reconnect(HostControllerClient.java:98)
14:35:53,690 ERROR [stderr] (main) at org.jboss.as.server.DomainServerMain.main(DomainServerMain.java:138)
14:35:53,692 ERROR [stderr] (main) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
14:35:53,692 ERROR [stderr] (main) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
14:35:53,692 ERROR [stderr] (main) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
14:35:53,692 ERROR [stderr] (main) at java.lang.reflect.Method.invoke(Method.java:597)
14:35:53,692 ERROR [stderr] (main) at org.jboss.modules.Module.run(Module.java:270)
14:35:53,693 ERROR [stderr] (main) at org.jboss.modules.Main.main(Main.java:411)
-
Having analysed lot of
sarreports to verify the OS performance issue but looks like the statistics from the OS looks fine, from the logs its more of looks like that there is some memory centric activity is happening which is performing page scanning and page freeing operations keepscpubusy. -
If
JVM/jbossnot be able to respond in the 30 seconds interval then there there could be various reasons for that like SAN issues from OS side.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.