The JGroups Flow Control (FC) Protocol and Deadlocks with Web Session Replication in JBoss

Solution Verified - Updated

Environment

  • JBoss Enterprise Application Platform (EAP)
    • 4.2
    • 4.3

Issue

  • Most or all JBoss http, https, or ajp threads appear to be stuck or blocked in the JGroups FC protocol:
    "http-localhost-8080-19" daemon prio=1 tid=0x0000000043f460f0 nid=0x7d76 in Object.wait() [0x000000004aabd000..0x000000004aabec10]
        at java.lang.Object.wait(Native Method)
        at EDU.oswego.cs.dl.util.concurrent.CondVar.timedwait(CondVar.java:222)
        - locked <0x00002b5feef4a068> (a EDU.oswego.cs.dl.util.concurrent.CondVar)
        at org.jgroups.protocols.FC.handleDownMessage(FC.java:454)
        at org.jgroups.protocols.FC.down(FC.java:374)
        at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:499)
        at org.jgroups.protocols.FC.receiveDownEvent(FC.java:368)
        at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
        at org.jgroups.protocols.FRAG2.down(FRAG2.java:167)
        at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:499)
        at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
        at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:294)
        at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:499)
        at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:390)
        at org.jgroups.JChannel.down(JChannel.java:1230)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:792)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.passDown(MessageDispatcher.java:769)
        at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:304)
        at org.jgroups.blocks.GroupRequest.doExecute(GroupRequest.java:444)
        at org.jgroups.blocks.GroupRequest.execute(GroupRequest.java:193)
        at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:432)
        at org.jgroups.blocks.RpcDispatcher.callRemoteMethods(RpcDispatcher.java:192)
        at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.jboss.cache.TreeCache.callRemoteMethodsViaReflection(TreeCache.java:4484)
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4433)
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4386)
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4504)
        at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110)
        at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88)
        at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:119)
        at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:88)
        at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
        at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:379)
        at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:174)
        at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
        at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:167)
        at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5934)
        at org.jboss.cache.TreeCache.put(TreeCache.java:3788)
        at sun.reflect.GeneratedMethodAccessor167.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
        at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
        at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
        at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
        at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
        at org.jboss.mx.util.MBeanProxyExt.invoke(MBeanProxyExt.java:210)
        at $Proxy175.put(Unknown Source)
        at org.jboss.web.tomcat.service.session.JBossCacheWrapper.put(JBossCacheWrapper.java:138)
        at org.jboss.web.tomcat.service.session.JBossCacheService.putSession(JBossCacheService.java:325)
        at org.jboss.web.tomcat.service.session.JBossCacheClusteredSession.processSessionRepl(JBossCacheClusteredSession.java:125)
        - locked <0x00002b60227853b8> (a org.jboss.web.tomcat.service.session.SessionBasedClusteredSession)
        at org.jboss.web.tomcat.service.session.JBossCacheManager.processSessionRepl(JBossCacheManager.java:1153)
        at org.jboss.web.tomcat.service.session.JBossCacheManager.storeSession(JBossCacheManager.java:702)
        - locked <0x00002b60227853b8> (a org.jboss.web.tomcat.service.session.SessionBasedClusteredSession)
        ...
    
  • The "IncomingPacketHandler (channel=Tomcat-CLUSTER-PROD)" thread is waiting to lock a org.jboss.web.tomcat.service.session.SessionBasedClusteredSession monitor held by an http, https, or ajp connector thread:

    "IncomingPacketHandler (channel=Tomcat-CLUSTER-PROD)" daemon prio=1 tid=0x00002aaaad5b6e60 nid=0x73e6 waiting for monitor entry [0x000000004697b000..0x000000004697dc90]
         at org.jboss.web.tomcat.service.session.ClusteredSession.expire(ClusteredSession.java:818)
         - waiting to lock <0x00002b60227853b8> (a org.jboss.web.tomcat.service.session.SessionBasedClusteredSession)
    ...
    
  • JBoss is unresponsive and "java.lang.OutOfMemoryError: Java heap space" appears in the JBoss server.log.

  • The heap object histogram shows retention in org.jgroups.protocols.TP$IncomingQueueEntry.
  • AJP Thread pool lockup, We have 2 servers in a cluster one of the servers has locked up the thread dump is below. The lock appears to be around the replication of HTTPSessions and in the FC protocol.

One server is logging repeatedly:

WARN  [org.jgroups.protocols.FC] (IncomingPacketHandler (channel=Tomcat-PresentationPartition):) Received two credit requests from 10.252.1.10:34333 without any intervening messages; sending 1999936 credits

The other is logging repeatedly:

WARN  [org.jboss.cache.TreeCache](ajp-sgz100-sap8%2F192.168.1.10-8009-6:) node /JSESSION/localhost/test?online/82ninyZauN+3eJieuTfVHQ** not found

Resolution

See Repeated JGroups Flow Control (FC) Warning Messages for potential causes and solutions of the FC blocking  situation.

JBoss EAP 4.3.x
Upgrade to the latest cumulative patch release JBoss EAP 4.3 CP10 [1] or later, which contains the fix (Content from jira.jboss.org is not included.JBPAPP-4094).

JBoss EAP 4.2.x
Upgrade to the latest cumulative patch release JBoss EAP 4.2 CP010 or later if available, which contains the fix (Content from jira.jboss.org is not included.JBPAPP-4094). Otherwise request a one off patch for the issue.

Note:
But note that even with this fix preventing the deadlock, poor performance can still result from the root issue causing FC to block.

Also, there are full release zips and patch zips. The patch zips can be applied by replacing jars and updating xmls.

[1] This content is not included.https://access.redhat.com/jbossnetwork/restricted/listSoftware.html?downloadType=distributions&product=appplatform&version=4.3.0.GA_CP10

Diagnostic Steps

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.