ISPN000196: Failed to recover cluster state after the current node became the coordinator
Environment
- Red Hat JBoss Enterprise Application Platform (EAP)
- All EAP 6.x prior to 6.4 CP06
Issue
- Cluster-wide rebalance failed with the following NPE and resulted to a severe clustering issue like multiple singletons.
ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread-2) ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.lang.NullPointerException
at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:454)
at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:234)
at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:625)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_72]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_72]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_72]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_72]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_72]
Resolution
Apply JBoss EAP 6.4 Cumulative Patch (CP) 6
This NPE from org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus() is fixed in EAP 6.4 CP06. Contact to Customer Support.
Root Cause
Any time when a view of a cluster changed (for example, a new member joined to or left from the cluster), cluster-wide rebalance runs to distribute cache data among the current members. Since the NPE is thrown from a very critical part of cluster-wide rebalance, a serious clustering issue can be resulted. The NPE is handled properly in EAP 6.4 CP06.
Diagnostic Steps
Usually many ERROR log messages and stack traces are recorded after the NPE. Find how the cluster view changed by chasing log ID of ISPN000093 and ISPN000094. If the NPE found between the view change and these ERROR messages, you may hit this issue.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.