EJB client loadbalancing will not work correct after a node is dropped from the cluster because of network or GC issues in JBoss EAP 6
Environment
- Red Hat JBoss Enterprise Application Platform (EAP)
- 6.4
Issue
-
EJB remote invocations from a client server (not sure whether it is the same if the client is a standalone application) to a clustered server with the EJB via remote outbound connection are not updated with the cluster-view after a node is suspected by a network or GC issue.
It seems that this is related to the following facts:
- The leaving node is not listed in the initial connections (remote-outbound-connection config, or jboss-ejb-client.properties)
- The node is suspected
- because of GC (or suspend) so it is not working
Here the issue is seen more often - because of network disconnect
Here the behaviour can be different because all nodes update the internal state
- because of GC (or suspend) so it is not working
- Cluster heal until the client is still alive
- It is a server (cluster) acting as a EJB client
- A standalone client continue without re-initialize the EJB context
The following issues can be seen:
- [client] Invocation made during split reach the unavailable node hung
- [server cluster] JGroups show message that a node is suspected and a new view applied
- [server cluster] after suspecting nodeX client-server get an update "nodeX removed from cluster-view"
- [server cluster] JGroups log messages with the new merged cluster view
- [client] after re-join nodeX the client-server get an update "node removed from cluster-view" where the "node" here is the still connected node not the one which left!
- in this case the client might restart the initial node detection
- [client] No client message for a view after cluster (JGroups) is correct again
- in this case the client continue with only one node
Resolution
Apply JBoss EAP 6.4 Cumulative Patch (CP) 7 or later
Root Cause
Diagnostic Steps
Unfortunately there is no client side logging to track that issue.
The attached EJBClient.NodeSelectorTracking.btm script can be used to track the node selection and the cluster view changes on client side.
See Using Byteman to troubleshoot Java issues for more details how byteman can be used.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.