Slow/Blocked EJB client when multiple JBoss EAP instances are configured as backend EJB servers in unstable network environment

Solution Verified - Updated

Environment

  • Red Hat JBoss Enterprise Application Platform (JBoss EAP)
    • 7.x

Issue

When EJB client (either standalone Java EJB client, or another JBoss EAP server in server-to-server EJB calls) is configured to connect to multiple back-end EJB/EAP server instances, it is possible to see EJB client becomes slow or blocked, when one (or multiple) backend EJB node(s) drops off network, or has slow network connections.

Capturing thread dumps when EJB client becomes slow or blocked, we may see similar threads as below waiting on DiscoveryEJBClientInterceptor.doAnyDiscovery():

"http-executor task-3" #801 prio=5 os_prio=0 tid=0x0000556d51610800 nid=0xca8 waiting on condition [0x00007f8617b4b000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000dc522618> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.wildfly.discovery.Discovery$BlockingQueueServicesQueue.await(Discovery.java:250)
	at org.wildfly.discovery.Discovery$BlockingQueueServicesQueue.takeService(Discovery.java:282)
	at org.jboss.ejb.client.DiscoveryEJBClientInterceptor.doAnyDiscovery(DiscoveryEJBClientInterceptor.java:493)

"http-executor task-1" #363 prio=5 os_prio=0 tid=0x00007f864550c800 nid=0x7a3 waiting on condition [0x00007f861b54b000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000f910bd80> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.wildfly.discovery.Discovery$BlockingQueueServicesQueue.await(Discovery.java:250)
	at org.wildfly.discovery.Discovery$BlockingQueueServicesQueue.takeService(Discovery.java:282)
	at org.jboss.ejb.client.DiscoveryEJBClientInterceptor.doAnyDiscovery(DiscoveryEJBClientInterceptor.java:493)

Resolution

Note These properties are valid for EAP 7.3 Update 5+

The EJB client has a discovery timeout which is used to configure the time to wait on discovering pre-configured back-end EJB servers.
By default this timeout is set to 0 which means no limit/timeout. To change this timeout, we can setup Java system property -Dorg.jboss.ejb.client.discovery.timeout (unit: second).

If EJB client is a standalone Java program, add this property in the Java command line.
If EJB client is another EAP server instance, add this property in standalone.conf or domain.conf, depending on your EAP's management mode .

In addition, the fix to This content is not included.EJBCLIENT-356 introduces another "additional timeout" setting: -Dorg.jboss.ejb.client.discovery.additional-node-timeout (unit: milli-second).
This additional timeout can be set to a relatively shorter value to allow the EJB client to start calling EJB as long as there is one active back-end node discovered/connected, without having to wait for all back-end EAP nodes to be discovered/connected which may take too long in an unstable network environment.

Example to setup these two properties:

-Dorg.jboss.ejb.client.discovery.timeout=10 -Dorg.jboss.ejb.client.discovery.additional-node-timeout=500

This will configure the main discovery timeout to be 10 seconds, as long as there is one EJB server node is discovered/connected, the timeout for rest nodes/instances will be 0.5 second (500 milli-second). Note that with EAP 7.4.10+ that the additional-node-timeout now defaults to 200 ms instead of being unbounded (This content is not included.EJBCLIENT-485).

-Dorg.jboss.ejb.client.discovery.additional-node-timeout will only be available when the fix to This content is not included.EJBCLIENT-356 is released.

NOTE
To Apply the system property above, you need to be at least on EAP7.3 Update 5 or newer version.

Root Cause

This is reported as a bug This content is not included.EJBCLIENT-356

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.