Client hangs on remote EJB calls during network / power outage in EAP 7 / 6
Environment
Red Hat JBoss Enterprise Application Platform (EAP) 7.0 CP4
Issue
- A client will be blocked if a network connection to the server is broken or the server machine has power loss. Client only continues after network timeout.
- EJB client connections appear to be hanging in the event of a network/power outage.
- Standalone client -> 2-Server Domain (SLSB)
- vm1: server-1
- vm2: server-2
- vm3: standalone Java standalone client
- The Java standalone client connected to the two servers as defined in
jboss-ejb-client.propertiesand is load balancing between those two servers. When we execute a power-off on one of vms i.e. vm2 running server-2, the client continues to send requests only server-1 & server-2 (which was powered down). As a result the client gets totally blocked and receives an invocation timeout exception after 31s ("No invocation response received in 31000 milliseconds").
After the restart of vm2, the client terminates without notice. - The same test worked well using a (clustered) stateful SessionBean (SFSB)
Resolution
Client blocked if a network connection to the server is broken or the server machine has power loss. The client only recovers after network timeout.
The following parameters, will protect against client threads in awaitResponse indefinitely. The READ_TIMEOUT needs to be greater than the HEARTBEAT_INTERVAL with some additional margin, so we are setting the READ_TIMEOUT = 2 x HEARTBEAT_INTERVAL. The remoting connection to the remote EAP server will send a small message every 30 seconds, this will prevent firewalls from closing what it think might be idle connections, but then with the READ_TIMEOUT if the remoting connection does not get a response from the heartbeat in the 60 seconds, then it will error out the clients and reconnect. This will free up the clients from awaitResponse and they will receive an Exception, but then the clients will be able to process new requests.
connect.options.org.xnio.Options.READ_TIMEOUT=60000
connect.options.org.xnio.Options.KEEP_ALIVE=true
org.jboss.remoting3.RemotingOptions.HEARTBEAT_INTERVAL=30000
Note: In EAP 7.0, this is a known issue earmarked for EAP 7.0 CP7. Ref Content from issues.jboss.org is not included.JBEAP-10259
NOTE An update to the ejb client jar is required for the fix to take effect. This bug is already fixed in EAP 7.1, so all that is needed is setting the READ_TIMEOUT / WRITE_TIMEOUT.
Diagnostic Steps
Checking thread dumps on the client side, you will see all of the client threads with the awaitResponse in the stacktrace indicating it sent an EJB request and is waiting for a response, but the threads will not progress if the network connection was lost and they will wait indefinitely.
On the server side, if an EJB receives a request, starts processing but the network connection is lost before it tries to send the response, you will see ERRORs logged in the server.log where the EJB will log a message 'NotOpenException: Writes closed' such as shown below indicating that there was no where to send the response back.
ERROR ... JBAS014250: Could not write method invocation failure for method public abstract java.lang.String jboss.example.ejb.Hello.sayHello(int) on bean named Hello for appname modulename ejb-client distinctname due to: java.io.IOException: JBAS014560: Could not open message outputstream for writing to Channel
...
Caused by: org.jboss.remoting3.NotOpenException: Writes closed
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.