ISPN000196: Failed to recover cluster state after the current node became the coordinator

Solution Verified - Updated 2 Aug 2024

Environment

Red Hat JBoss Enterprise Application Platform (EAP)
- All EAP 6.x prior to 6.4 CP06

Issue

Cluster-wide rebalance failed with the following NPE and resulted to a severe clustering issue like multiple singletons.

 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread-2) ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.lang.NullPointerException
        at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:454)
        at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:234)
        at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:625)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_72]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_72]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_72]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_72]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_72]

Resolution

Apply JBoss EAP 6.4 Cumulative Patch (CP) 6
This NPE from org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus() is fixed in EAP 6.4 CP06. Contact to Customer Support.

Root Cause

Any time when a view of a cluster changed (for example, a new member joined to or left from the cluster), cluster-wide rebalance runs to distribute cache data among the current members. Since the NPE is thrown from a very critical part of cluster-wide rebalance, a serious clustering issue can be resulted. The NPE is handled properly in EAP 6.4 CP06.

This content is not included.BZ-1283465: Failed to recover cluster state after the current node became the coordinator

Diagnostic Steps

Usually many ERROR log messages and stack traces are recorded after the NPE. Find how the cluster view changed by chasing log ID of ISPN000093 and ISPN000094. If the NPE found between the view change and these ERROR messages, you may hit this issue.

SBR

JBoss Clustering

Product(s)

Red Hat Data Grid

Category

Troubleshoot

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.