Why does a cluster node fence other nodes that haven't started cman yet when it initially joins the cluster in RHEL 4, 5, or 6?
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
- Red Hat Cluster Suite (RHCS) 4
fencedevicesdefined in/etc/cluster/cluster.conf, andfencedenabled for use by the cluster (FENCE_JOIN="yes"or left unset in/etc/sysconfig/cman)<fence_daemon clean_start="0">or left unset in/etc/cluster/cluster.conf
Issue
- When starting
cmanon node 1, it fences node 2 (which has not yet startedcman). - Nodes that are not in the cluster are rebooted when other nodes join the cluster for the first time.
- Two nodes in cluster got rebooted. Both nodes in same cluster and both nodes in the cluster sent a reboot signal to each other, one fencing the other first then the other when it was coming back up.
- Why is my cluster node powering off a node that isn't in the cluster when it is booted up?
Resolution
-
If possible, start
cmanon all nodes at the same time, or boot them at the same time if it ischkconfig'd on. -
Consider increasing
post_join_delayin/etc/cluster/cluster.conf, if more time is needed before executing fencing startup, such as if nodes may be started in a staggered fashion, or take different amounts of time to boot up.
Root Cause
When a cluster membership forms for the first time and gains quorum, the nodes in that membership will fence any nodes that have not joined after post_join_delay seconds. This is necessary to ensure that any nodes which are not in communication have released any shared resources, and are reset back to a "known" state.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.