Repeated JGroups Flow Control (FC) Warning Messages and threads waiting in FC
Environment
- Red Hat JBoss Enterprise Application Platform (EAP)
- 4.2.x
- 4.3.x
- 5.x
- 6.x
Issue
-
Warning messages like these appear in
log/server.logor the console:WARN [org.jgroups.protocols.FC] Received two credit requests from 127.0.0.2:64842 without any intervening messages; sending 1999689 credits WARN [org.jgroups.protocols.FC] Received two credit requests from 127.0.0.2:64842 without any intervening messages; sending 1999093 credits WARN [org.jgroups.protocols.FC] Received two credit requests from 127.0.0.2:64842 without any intervening messages; sending 2000000 credits -
The following WARN messages are appearing non-stop in the log file over a long period
WARN [org.jgroups.protocols.FC] Received two credit requests from 192.168.10.11:55200 without any intervening messages; sending 1986459 credits -
JBoss is slow and we see request threads stalling in replication activity and JGroups FlowControl:
"ajp-/127.0.0.1:8009-1" daemon prio=10 tid=0x00007f82f40bf800 nid=0x7a2a in Object.wait() [0x00007f82cddd9000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000b6794e60> (a org.jgroups.protocols.FlowControl$Credit)
at org.jgroups.protocols.FlowControl$Credit.decrementIfEnoughCredits(FlowControl.java:553)
- locked <0x00000000b6794e60> (a org.jgroups.protocols.FlowControl$Credit)
at org.jgroups.protocols.UFC.handleDownMessage(UFC.java:114)
Resolution
These messages mean that the JGroups Flow Control (FC) protocol is having trouble controlling the flow of JGroups messages because one node is asking for FC credits repeatedly without sending anything else. Until the node requesting credits has more, no more JGroups messages will be sent which can result in the hanging of an application using a JGroups channel in this state.
There are several potential causes:
-
Using Hardware of Unequal Processing Power
- In general, JGroups cluster nodes are peers so each box in use should have similar computing capacity.
-
A Lossy Network with Respect to UDP
- This can only be the cause if UDP clustering is used. See diagnostic steps.
- Resolving UDP clustering issues typically requires that a network operations team be involved.
- Switching to TCP clustering may be an alternative.
-
A Sustained High-CPU Situation
- Obtain GC logs. See diagnostic steps.
-
An Application Specific Problem
- If an application repeatedly sends large JGroups messages (large web session objects, for example), it is possible that it could cause this situation.
-
An overall system performance issue (such as swapping)
- Capture a sosreport to review system performance and memory usage
- If a virtual environment, see Unaccounted memory usage when running Red Hat Enterprise Linux as a VMware guest
To help make flow control blocks less likely and limit their impact on request time, you can consider:
-
Changing the replication granularity so replicated data size is smaller and uses less flow control credits. In your application
WEB-INF/jboss-web.xml:<jboss-web> <replication-config> <replication-granularity>ATTRIBUTE</replication-granularity> </replication-config> </jboss-web> -
Change to
INTERVALsnapshot mode. With the defaultINSTANT, the request threads themselves replicate their session data before providing a response and so your response time is directly impacted by delays in that cluster activity. WithINTERVAL, session data replication is offloaded from the request threads and instead done by a background thread on a periodic interval. The caveat is that data is instead replicated in bulk and not as quickly following a session data change as withINSTANT, but it helps avoid the response time impact from replication activity. That can be changed in yourWEB-INF/jboss-web.xmlas well:<jboss-web> <replication-config> <snapshot-mode>INTERVAL</snapshot-mode> </replication-config> </jboss-web>
Diagnostic Steps
- If UDP clustering is used, the following steps can be taken to determine if a lossy network with respect to UDP datagrams is the issue:
- Obtain Content from community.jboss.org is not included.JGroups Probe information for the cluster channel in question and examine the NAKACK retransmit statistics
- Set the log level to TRACE for org.jgroups.protocols.pbcast.NAKACK or alternatively org.jgroups so see if JGroups messages are being lost and retransmitted
- Ask that the network operations team monitor the network for dropped UDP datagrams (the percentage of dropped unicast and multicast datagrams is important)
- If high CPU utilization is involved or suspected, enable garbage collection (GC) logging and look for prolonged GC activity
- If the message WARN [org.jgroups.protocols.FC] Received two credit requests from x.x.x.x:64842 without any intervening messages; sending 2000000 credits is occurring in large bunches, a whole bunch at the same time, then again another bunch minutes later, it's often an indicator of an underlying GC issue. Under normal operation, FC only asks for more credits every now and then.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.