Repeated JGroups Flow Control (FC) Warning Messages and threads waiting in FC

Solution Verified - Updated

Environment

  • Red Hat JBoss Enterprise Application Platform (EAP)
    • 4.2.x
    • 4.3.x
    • 5.x
    • 6.x

Issue

  • Warning messages like these appear in log/server.log or the console:

      WARN [org.jgroups.protocols.FC] Received two credit requests from 127.0.0.2:64842 without any intervening messages; sending 1999689
      credits
      WARN [org.jgroups.protocols.FC] Received two credit requests from 127.0.0.2:64842 without any intervening messages;
      sending 1999093 credits
      WARN [org.jgroups.protocols.FC] Received two credit requests from 127.0.0.2:64842 without any intervening messages; sending 2000000
      credits
    
  • The following WARN messages are appearing non-stop in the log file over a long period

      WARN [org.jgroups.protocols.FC] Received two credit requests from 192.168.10.11:55200 without any intervening messages; sending 1986459 credits
    
  • JBoss is slow and we see request threads stalling in replication activity and JGroups FlowControl:

"ajp-/127.0.0.1:8009-1" daemon prio=10 tid=0x00007f82f40bf800 nid=0x7a2a in Object.wait() [0x00007f82cddd9000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000b6794e60> (a org.jgroups.protocols.FlowControl$Credit)
	at org.jgroups.protocols.FlowControl$Credit.decrementIfEnoughCredits(FlowControl.java:553)
	- locked <0x00000000b6794e60> (a org.jgroups.protocols.FlowControl$Credit)
	at org.jgroups.protocols.UFC.handleDownMessage(UFC.java:114)

Resolution

These messages mean that the JGroups Flow Control (FC) protocol is having trouble controlling the flow of JGroups messages because one node is asking for FC credits repeatedly without sending anything else. Until the node requesting credits has more, no more JGroups messages will be sent which can result in the hanging of an application using a JGroups channel in this state.

There are several potential causes:

  • Using Hardware of Unequal Processing Power

    • In general, JGroups cluster nodes are peers so each box in use should have similar computing capacity.
  • A Lossy Network with Respect to UDP

    • This can only be the cause if UDP clustering is used. See diagnostic steps.
    • Resolving UDP clustering issues typically requires that a network operations team be involved.
    • Switching to TCP clustering may be an alternative.
  • A Sustained High-CPU Situation

    • Obtain GC logs. See diagnostic steps.
  • An Application Specific Problem

    • If an application repeatedly sends large JGroups messages (large web session objects, for example), it is possible that it could cause this situation.
  • An overall system performance issue (such as swapping)

To help make flow control blocks less likely and limit their impact on request time, you can consider:

  • Changing the replication granularity so replicated data size is smaller and uses less flow control credits. In your application WEB-INF/jboss-web.xml:

      <jboss-web>
          <replication-config>
             <replication-granularity>ATTRIBUTE</replication-granularity>
          </replication-config>
      </jboss-web>
    
  • Change to INTERVAL snapshot mode. With the default INSTANT, the request threads themselves replicate their session data before providing a response and so your response time is directly impacted by delays in that cluster activity. With INTERVAL, session data replication is offloaded from the request threads and instead done by a background thread on a periodic interval. The caveat is that data is instead replicated in bulk and not as quickly following a session data change as with INSTANT, but it helps avoid the response time impact from replication activity. That can be changed in your WEB-INF/jboss-web.xml as well:

      <jboss-web>
          <replication-config>
            <snapshot-mode>INTERVAL</snapshot-mode>
          </replication-config>
      </jboss-web>
    

Diagnostic Steps

  • If UDP clustering is used, the following steps can be taken to determine if a lossy network with respect to UDP datagrams is the issue:
  • If the message WARN [org.jgroups.protocols.FC] Received two credit requests from x.x.x.x:64842 without any intervening messages; sending 2000000 credits is occurring in large bunches, a whole bunch at the same time, then again another bunch minutes later, it's often an indicator of an underlying GC issue. Under normal operation, FC only asks for more credits every now and then.
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.