EJB client loadbalancing will not work correct after a node is dropped from the cluster because of network or GC issues in JBoss EAP 6

Solution Verified - Updated 2 Aug 2024

Environment

Red Hat JBoss Enterprise Application Platform (EAP)
- 6.4

Issue

EJB remote invocations from a client server (not sure whether it is the same if the client is a standalone application) to a clustered server with the EJB via remote outbound connection are not updated with the cluster-view after a node is suspected by a network or GC issue.

It seems that this is related to the following facts:
- The leaving node is not listed in the initial connections (remote-outbound-connection config, or jboss-ejb-client.properties)
- The node is suspected
  - because of GC (or suspend) so it is not working
    Here the issue is seen more often
  - because of network disconnect
    Here the behaviour can be different because all nodes update the internal state
- Cluster heal until the client is still alive
  - It is a server (cluster) acting as a EJB client
  - A standalone client continue without re-initialize the EJB context
The following issues can be seen:
- [client] Invocation made during split reach the unavailable node hung
- [server cluster] JGroups show message that a node is suspected and a new view applied
- [server cluster] after suspecting nodeX client-server get an update "nodeX removed from cluster-view"
- [server cluster] JGroups log messages with the new merged cluster view
- [client] after re-join nodeX the client-server get an update "node removed from cluster-view" where the "node" here is the still connected node not the one which left!
  - in this case the client might restart the initial node detection
- [client] No client message for a view after cluster (JGroups) is correct again
  - in this case the client continue with only one node

Resolution

Apply JBoss EAP 6.4 Cumulative Patch (CP) 7 or later

Root Cause

This content is not included.bz-1290848: (6.4.z) EJB client loadbalancing will not work correct after a node is dropped from the cluster because of network or GC issues

Diagnostic Steps

Unfortunately there is no client side logging to track that issue.
The attached EJBClient.NodeSelectorTracking.btm script can be used to track the node selection and the cluster view changes on client side.
See Using Byteman to troubleshoot Java issues for more details how byteman can be used.

SBR

Product(s)

Red Hat JBoss Enterprise Application Platform

Components

jbossas

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.