Exploring Concepts of RHEL High Availability Clusters - Fencing/STONITH

Updated 11 Mar 2022

Overview

Applicable Environments

Red Hat Enterprise Linux (RHEL) 6, 7, or 8 with the High Availability Add-On

Useful References and Guides

Exploring the concepts, components, and features of a RHEL High Availability cluster

Concepts

What is fencing?

Fencing - to fence - is an action that is carried out by a RHEL High Availability cluster, with a goal of removing from a member its access to resources shared by the cluster. A fence action would be executed against a cluster member that has entered some problematic state where it can no longer successfully or safely participate in the cluster's activities.

Fencing is carried out by something referred to as a fence method or a fence device.

What is STONITH?

STONITH stands for "Shoot The Other Node In The Head". This acronym is another term for fencing - the violent description making clear that the goal is to quickly and forcefully disable the other node. This fence method must prevent that problematic from accessing shared resources. To "kill" a server is an effective way to stop it from accessing a device or component that some other server needs exclusive access to.

STONITH is also a descriptive term applied to components related to fencing - such as a STONITH device (aka fence device), a STONITH action (aka fence action), a STONITH method (aka fence method), the stonith-ng daemon (the daemon responsible for fencing), etc.

What resources do clusters share, and why do they need protecting?

A RHEL High Availability cluster is intended to maintain uptime of applications, services, or other "resources" through various failure scenarios.

A cluster may share any number of different types of resources: network IP addresses that "float" around a cluster to run alongside an application; filesystems on shared devices; shared LVM volume groups; applications; services; databases; web servers; scripts; and more.

Organizations deploy high availability clusters because these resources are critical and cannot be allowed to be out of service for long (or at all). Conflicts, corruptions, failures, or other forms of unavailability would be unacceptable for such critical resources.

If two systems share access to a storage device without any form of coordination, a risk would exist that both systems could take action on that device at the same time in an incompatible way - such as writing different data to the same location, resulting in corruption. If two systems try to host the same IP address on the same network, clients may have difficulty communicating with the services they expect to reach at that IP.

So, resources have to be managed in a way that keeps them safe from conflicts, corruption, or other problems that could arise through failure scenarios or unexpected conditions.

How does fencing protect resources?

Many failure scenarios that a cluster must handle are capable of leaving the members unable to communicate with each other. Networks go down, servers crash or lose power, disasters happen - and in those situations, the cluster members can lose the ability to talk to each other.

Software failures may occur in a way that prevent an application or resource from being stopped. Problems with the operating system of a server could create some condition where the managed services of a cluster are stuck without some form of administrator intervention.

In these situations, the cluster software can choose some portion of the membership that should continue functioning, and have them execute a fence action against the problematic members. Those members that are fenced become cut off from their shared resources - unable to create conflicts or corruption while the "winning" side of the cluster continues to access those resources. Fencing gives some subset of members in a cluster the assurance that it is safe to carry out the functions of the cluster, free from interference from other members that aren't in a healthy state.

Resources are protected because the cluster software is designed to cease interacting with shared resources until fencing of problematic members is complete. If some nodes leave the cluster membership abruptly - any member that is still alive will stop trying to access resources until the cluster confirms that the missing members have had their access cut off. If some member disappeared and the other members need to coordinate to serve the resources it was previously hosting - they won't begin serving them until that fencing is complete.

How does the cluster decide when a member should be fenced?

Fencing may be executed against a node for a few reasons:

Some failure occurs in the management of resources on a member
- The member may no longer be a good candidate to host resources - triggering fencing to forcefully prevent it from continuing.
- The failure may have left resources in an unclean state on that member, - triggering fencing to reset its state and/or keep it from interacting with resources in a bad state.
Some event occurs that changes the membership of the cluster, such that some nodes are in contact with each other, while others are out of contact.

How does the cluster use fencing when a resource operation fails in a pacemaker cluster?

pacemaker is the resource-manager in RHEL 7. pacemaker is one available resource-manager for RHEL 6 (with the other option being rgmanager).

In pacemaker clusters, a resource can be configured such that certain operation failures should trigger fencing of the node where the failure occurred.

By default, if a resource is being stopped and that stop cannot complete successfully, the cluster will fence the node where the stop failure occurred. A failure to stop a resource leaves it in an unknown state where it may not be safe to start in any other location, but where its may not be obvious to the cluster software what needs to be done to get it to fully stop - this is why the default for resources is op stop on-fail=fence. By fencing the node where the stop failure occurred, the unknown-state of any resources is corrected, allowing another node to recover those resources into the ideal state.

Other operations can be configured to trigger fencing of a node that fails that operation - for example an administrator could instruct the cluster to fence a node if a monitor operation fails, or a start operation fails.

How does the cluster use fencing when a resource operation fails in an rgmanager cluster?

RHEL 6 offers an alternative resource-manager called rgmanager. rgmanager doesn't offer a standard way to trigger fencing in resource failures across any type of resource. However, a few resource types implement a setting that instructs a cluster node to self-reboot if a stop operation fails. The resource types with such a self_fence option are: <lvm/>, , , and `.

These mechanisms don't directly trigger the cluster's fencing mechanism, but rather will trigger a hard-reboot of the node, which will cause the cluster to lose contact with it, thereby triggering the other nodes to fence it (see next section).

How does the cluster use fencing when a member loss or membership split occurs?

If some portion of the cluster's membership loses contact with other members, there is no simple way for those functional members to have awareness of what state the missing members are in. Those missing members could be crashed, busy processing some intensive task, have their network connection disrupted, or something else. Before any member can interact with shared resources, it must have confirmation that no missing member could be interacting with them in a conflicting way.

Quorum is a policy that gives only one portion of a cluster membership the authority to continue operating. This might be a "majority wins" policy, or it could be a system that defines authority based some a measure of member health. Either way, the quorum policy is meant to be the deciding factor to all nodes of a cluster of which of them is allowed to continue operating.

See also: Exploring concepts: quorum

Thus, when the membership changes or splits in some way, the quorum policy should result in each member seeing itself as either "quorate" or "inquorate". Quorate members - those meeting the conditions of quorum - coordinate amongst themselves to fence nodes that are no longer recognized members. If a member that has become inquorate is responsive enough to be able to process its own condition, it recognizes that it does not have authority to take further action within the cluster, and thus does not continue accessing shared resources.

Following a membership change, the quorate portion of a cluster will only proceed with resource management and interaction after STONITH of lost members has been confirmed as successful. This mandatory action protects resources from conflicting access by disconnected portions of a cluster.

What does it mean for a cluster to be in a "split-brain" state?

The term "split-brain" generally describes a problematic condition that high availability cluster solutions must avoid or address when members of such a cluster lose contact with each other. When cluster members are unable to communicate with each other, then they may be at risk of performing conflicting activities because they cannot coordinate with each other.

If the membership of a high availability cluster were to split into distinct sets of nodes that could not communicate with each other, and those "partitions" each considered themself as qualified to continue managing shared resources - they would be in a split brain state. A possible result from that split-brain state could be multiple systems mounting a non-cluster-aware filesystem, thereby corrupting it, for example.

The RHEL High Availability cluster software is designed to avoid split-brain conditions through two mechanisms:

The quorum policy should result in only one partition of a cluster membership having authority to carry out tasks in the cluster.
The cluster implementation should require that fencing/STONITH of missing members be completed before cluster activity can resume - thereby ensuring only one partition has access to shared resources.

What are some examples of how fencing can be carried out?

RHEL High Availability comes with various methods to execute fencing, including:

Agents that connect with external power sources for nodes, such as PDUs and power outlets, and turn them off.
Agents that connect with a node's BMC - Baseband Management Controller, like an HP iLO, Dell DRAC, IBM RSA, or similar - to instruct it to power off the node.
Agents that interact with a blade-server's management interface - like IBM Bladecenter, Cisco UCS, HP Moonshot, etc.
Agents that login to a switch that connects a node to its storage devices - like a fiber-channel switch - and shut off ports.
An agent that uses the SCSI protocol to define access controls against devices shared by the cluster, to control which nodes are allowed to access.
A method that utilizes a node's watchdog timer device integrated into its platform to reset a node if it becomes unhealthy.
An administrator is alerted that some problem has occurred, they manually take action to power off the server, and they signal to the cluster that they have executed a "manual fencing" action against the node.

Components Involved in Fencing

corosync

corosync is the engine of a cluster that handles membership and communication between members. If a node stops responding, or a communication problem disrupts node connectivity, it is corosync that will detect it through its communication protocols.

corosync feeds membership information to other components running in the cluster, and those components can decide if any change that occurred warrants fencing be carried out.

See also: Exploring components: corosync

pacemaker as a STONITH manager

NOTE: pacemaker is only used in certain High Availability configurations. Some RHEL 6 clusters use an alternative cluster design that does not use pacemaker. All RHEL 7 High Availability clusters use pacemaker.

pacemaker is a component made up of various processes that carry out the important functions of a cluster. Its daemons coordinate to serve as a resource-manager, it has a daemon to calculate steps to serve resources in their ideal state, it has a daemon to track the configuration of the cluster, and more.

pacemaker provides the stonith-ng daemon - STONITH "next generation", being an improvement over previous STONITH implementations. This daemon is tasked with accepting requests to fence a member, reading the administrator-defined STONITH configuration, and executing the necessary fence actions to cut a node off from its resources.

stonith-ng may accept requests take requests from other pacemaker daemons when an event in the cluster warrants fencing, and may accept requests to fence a node upon an administrator's command.

pacemaker as a resource-manager

pacemaker also serves as the resource-manager of a High Availability cluster, in that it tracks states of all members and resources, and ensures that the resources are running on an available member according to the defined policies of the cluster.

As pacemaker's processes are coordinating to keep these resources in their ideal state, they monitor for failures or problems that occur with those resources. If a resource incurs a failure, pacemaker will attempt to recover from it and get the resource back into the ideal state. If a recovery action cannot succeed on some node, then the cluster may be designed to fence that node. This can give another node the ability to step in and recover the resources itself without interference from the problematic node.

Administrators can define policies that apply to failures of specific operations (start, stop, monitor, etc) carried out against individual resources. One such available policy is to have the cluster fence a node if the chosen operation fails. If this happens, the cluster daemons feed the request to stonith-ng and it carries out fencing against the problematic node.

fence-agents

The RHEL operating system is designed to be able to run on various hardware platforms and virtualization environments. Thus RHEL High Availability cluster deployments come in many different shapes and styles, each that may offer different possibilities for fencing a node.

Some cluster designs may have networked power sources for all nodes that could be accessed for fencing. Some clusters are made up of servers with BMCs available that can reboot their nodes on demand. Other clusters may be made up of virtual machines that have a hypervisor or hypervisor cluster that can be contacted to shut down those VMs. There are various other options besides these.

The fence-agents component of a cluster provides programs or scripts that the High Availability software can execute to interact with some fencing mechanism and engage it. These agents are implemented to accept certain configured inputs - this might be the IP address of the device, or details about which plug on the device should be engaged, the login and password, etc. These agents advertise to the cluster software what parameters they will accept, and the administrator can configure the cluster to pass in those parameters any time the agent must be executed.

sbd

sbd stands for Storage-based-death, and provides several different methods for achieving "self fencing" - a fencing strategy that enlists a node to control its own state and its own access to shared resources. Despite its name, not all of the sbd methods depend on storage, but one method uses shared storage devices to communicate fencing instructions throughout a cluster.

sbd offers a number of advantages over other STONITH methods that have been described, such as power-fencing and storage-access fencing.