Increased memory usage during JBoss EAP buddy replication state transfer

Solution Verified - Updated

Environment

  • Red Hat JBoss Enterprise Application Platform (EAP)
    • 4.x, 5.x
  • Clustered configuration with buddy replication

Issue

  • The following exception is found in the JBoss server log:
ERROR [org.jboss.cache.buddyreplication.BuddyManager] Caught exception handling view change org.jboss.cache.CacheException: java.lang.OutOfMemoryError: GC overhead limit exceeded
  • Killing one node in a cluster results in "OutOfMemoryError: GC overhead limit exceeded" or "OutOfMemoryError: Java heap space" on the remaining nodes.
  • We captured a heap dump and the bulk of the retention is in a single AsyncViewChangeHandlerThread, which holds it in a byte[] and/or an ExposedByteArrayOutputStream.

Resolution

The following are all possible solutions:

  • Increase Java heap (-Xmx).
  • Decrease session count/size (e.g. lower session timeout if sessions are short lived and load is even).
  • Disable session replication (typically not a good long-term solution in production).

Root Cause

  • When a node goes down, each node that had the down node as a buddy will have to find a new buddy. There is temporary increased memory on affected nodes as a result of the serialization, transfer, and deserialization needed to transfer state. After the redistribution, there is permanent increased memory as each node has to store more due to there being less nodes in the cluster among which to distribute state.
  • A node joining can cause the same issue as it shifts buddies and triggers replication.

Diagnostic Steps

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.