Repeated JGroups Flow Control (FC) Warning Messages and threads waiting in FC

Solution Verified - Updated 7 Aug 2024

Environment

Red Hat JBoss Enterprise Application Platform (EAP)
- 4.2.x
- 4.3.x
- 5.x
- 6.x

Issue

Warning messages like these appear in log/server.log or the console:

  WARN [org.jgroups.protocols.FC] Received two credit requests from 127.0.0.2:64842 without any intervening messages; sending 1999689
  credits
  WARN [org.jgroups.protocols.FC] Received two credit requests from 127.0.0.2:64842 without any intervening messages;
  sending 1999093 credits
  WARN [org.jgroups.protocols.FC] Received two credit requests from 127.0.0.2:64842 without any intervening messages; sending 2000000
  credits

The following WARN messages are appearing non-stop in the log file over a long period

  WARN [org.jgroups.protocols.FC] Received two credit requests from 192.168.10.11:55200 without any intervening messages; sending 1986459 credits

JBoss is slow and we see request threads stalling in replication activity and JGroups FlowControl:

"ajp-/127.0.0.1:8009-1" daemon prio=10 tid=0x00007f82f40bf800 nid=0x7a2a in Object.wait() [0x00007f82cddd9000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000b6794e60> (a org.jgroups.protocols.FlowControl$Credit)
	at org.jgroups.protocols.FlowControl$Credit.decrementIfEnoughCredits(FlowControl.java:553)
	- locked <0x00000000b6794e60> (a org.jgroups.protocols.FlowControl$Credit)
	at org.jgroups.protocols.UFC.handleDownMessage(UFC.java:114)

Resolution

These messages mean that the JGroups Flow Control (FC) protocol is having trouble controlling the flow of JGroups messages because one node is asking for FC credits repeatedly without sending anything else. Until the node requesting credits has more, no more JGroups messages will be sent which can result in the hanging of an application using a JGroups channel in this state.

There are several potential causes:

Using Hardware of Unequal Processing Power
- In general, JGroups cluster nodes are peers so each box in use should have similar computing capacity.
A Lossy Network with Respect to UDP
- This can only be the cause if UDP clustering is used. See diagnostic steps.
- Resolving UDP clustering issues typically requires that a network operations team be involved.
- Switching to TCP clustering may be an alternative.
A Sustained High-CPU Situation
- Obtain GC logs. See diagnostic steps.
An Application Specific Problem
- If an application repeatedly sends large JGroups messages (large web session objects, for example), it is possible that it could cause this situation.
An overall system performance issue (such as swapping)
- Capture a sosreport to review system performance and memory usage
- If a virtual environment, see Unaccounted memory usage when running Red Hat Enterprise Linux as a VMware guest

To help make flow control blocks less likely and limit their impact on request time, you can consider:

Changing the replication granularity so replicated data size is smaller and uses less flow control credits. In your application WEB-INF/jboss-web.xml:

  <jboss-web>
      <replication-config>
         <replication-granularity>ATTRIBUTE</replication-granularity>
      </replication-config>
  </jboss-web>

Change to INTERVAL snapshot mode. With the default INSTANT, the request threads themselves replicate their session data before providing a response and so your response time is directly impacted by delays in that cluster activity. With INTERVAL, session data replication is offloaded from the request threads and instead done by a background thread on a periodic interval. The caveat is that data is instead replicated in bulk and not as quickly following a session data change as with INSTANT, but it helps avoid the response time impact from replication activity. That can be changed in your WEB-INF/jboss-web.xml as well:
```
  <jboss-web>
      <replication-config>
        <snapshot-mode>INTERVAL</snapshot-mode>
      </replication-config>
  </jboss-web>
```

Diagnostic Steps

If UDP clustering is used, the following steps can be taken to determine if a lossy network with respect to UDP datagrams is the issue:
- Obtain Content from community.jboss.org is not included.JGroups Probe information for the cluster channel in question and examine the NAKACK retransmit statistics
- Set the log level to TRACE for org.jgroups.protocols.pbcast.NAKACK or alternatively org.jgroups so see if JGroups messages are being lost and retransmitted
- Ask that the network operations team monitor the network for dropped UDP datagrams (the percentage of dropped unicast and multicast datagrams is important)
- If high CPU utilization is involved or suspected, enable garbage collection (GC) logging and look for prolonged GC activity
If the message WARN [org.jgroups.protocols.FC] Received two credit requests from x.x.x.x:64842 without any intervening messages; sending 2000000 credits is occurring in large bunches, a whole bunch at the same time, then again another bunch minutes later, it's often an indicator of an underlying GC issue. Under normal operation, FC only asks for more credits every now and then.

SBR

Product(s)

Red Hat JBoss Enterprise Application Platform

Components

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.