How to configure/manage STONITH 'levels' in RHEL cluster with pacemaker?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL) 6
  • Red Hat Enterprise Linux (RHEL) 7
  • Red Hat Enterprise Linux (RHEL) 8
  • Red Hat Enterprise Linux (RHEL) 9
  • pacemaker
  • STONITH levels

Issue

  • How to configure more that one fence devices in a cluster?
  • How to configure/manage STONITH levels in RHEL cluster with pacemaker?
  • How to configure fence agent fence_ipmilan in RHEL cluster with pacemaker?
  • How to configure fence agent fence_apc in RHEL cluster with pacemaker?

Resolution

In HA clustering with single STONITH device, the only STONITH device becomes a single point of failure. To address this, "fencing-topology" was added to pacemaker to configure multiple and complex fencing configurations. In pcs, this is done by configuring stonith levels for the fencing agents.

A very popular method of fencing is to use the internal fence devices like ipmi for fencing (fence_ipmilan). The IPMI draws its power from the host's power supply. One of the drawback for such fencing devices is that, should the host lose power, then the IPMI will not be able to respond to fence requests and the fence action will fail.

If the IPMI's network connection uses a single network interface, a broken or disconnected network cable, a failed switch port or switch or a failure in the NIC itself would also leave the IPMI interface inaccessible and the cluster node would not be fenced off.

The simple solution to this issue is to use a second fence method. When a cluster node needs to be stonith off (or fenced off), all the operations in level 1 are done. If there are successfully then no other levels are executed, but if that level fails it proceeds to the next level. If all level's operations are tried and all failed then it loops back to level 1 and starts all over. This will continue in this loop until one of the following occurs:

  • The cluster node rejoins the cluster after a reboot has occurred.
  • The cluster node was successfully fenced off from the cluster with the fence agent.

In the STONITH configuration below, two fence devices, fence_ipmilan and fence_apc are configured so that:

  • The fence_ipmilan fencing agent is tried first.
  • If the fence_ipmilan fencing agent does not succeed then the fence_apc agent will be tried.
Steps to implement STONITH levels

1. Configure the two desired fence devices. In this case, we are using fence_ipmilan and fence_apc fence agents.

#pcs stonith create ipmi-fencing1 fence_ipmilan pcmk_host_list="node1.example.com" ipaddr="10.65.208.102" login=root passwd=xxx op monitor interval=30s
#pcs stonith create ipmi-fencing2 fence_ipmilan pcmk_host_list="node2.example.com" ipaddr="10.65.208.103" login=root passwd=xxx op monitor interval=30s

# pcs stonith create apc-fencing1 fence_apc pcmk_host_list="node1.example.com" ipaddr="10.65.208.31" login=root passwd=xxx port=14 action=reboot op monitor interval=30s
# pcs stonith create apc-fencing2 fence_apc pcmk_host_list="node2.example.com" ipaddr="10.65.208.31" login=root passwd=xxx port=12 action=reboot op monitor interval=30s
  • The configuration would look similar to
# pcs status
Cluster name: rhel7testcluster
Last updated: Sun May 18 09:30:40 2014
Last change: Mon Apr 28 10:02:59 2014 via cibadmin on node2.example.com
Stack: corosync
Current DC: node1.example.com (1) - partition with quorum
Version: 1.1.10-29.el7-368c726
2 Nodes configured
6 Resources configured


Online: [ node1.example.com node2.example.com ]

Full list of resources:

 ipmi-fencing1	(stonith:fence_ipmilan):	Started node1.example.com 
 ipmi-fencing2	(stonith:fence_ipmilan):	Started node2.example.com 
 apc-fencing1	(stonith:fence_apc):	Started node1.example.com 
 apc-fencing2	(stonith:fence_apc):	Started node2.example.com 

PCSD Status:
  node1.example.com: Online
  node2.example.com: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

2. Create STONITH levels

[root@node1 ~]# pcs stonith level add 1 node1.example.com ipmi-fencing1
[root@node1 ~]# pcs stonith level add 1 node2.example.com ipmi-fencing2
[root@node1 ~]# pcs stonith level add 2 node1.example.com apc-fencing1
[root@node1 ~]# pcs stonith level add 2 node2.example.com apc-fencing2

3. Check the configuration

# pcs stonith level --- Lists all of the fencing levels currently configured
 Node: node1.example.com
  Level 1 - ipmi-fencing1
  Level 2 - apc-fencing1
 Node: node2.example.com
  Level 1 - ipmi-fencing2
  Level 2 - apc-fencing2

Note :

# pcs stonith level clear

--- Clears the fence levels on the node (or stonith id) specified or clears
        all fence levels if a node/stonith id is not specified.  If more than
        one stonith id is specified they must be separated by a comma and no
        spaces.  Example: pcs stonith level clear dev_a,dev_b

# pcs stonith level

To remove a particular device from the STONITH level

# pcs stonith level
 Node: node1.example.com
  Level 1 - ipmi-fencing1
  Level 2 - apc-fencing1
 Node: node2.example.com
  Level 1 - ipmi-fencing2
  Level 2 - apc-fencing2


# pcs stonith level remove 2 node1.example.com apc-fencing1 <-- removes second fencing method for node1

# pcs stonith level
 Node: node1.example.com
  Level 1 - ipmi-fencing1
 Node: node2.example.com
  Level 1 - ipmi-fencing2
  Level 2 - apc-fencing2

Fencing device would still be available though

#pcs status
[.... ]

Full list of resources:

 apc-fencing1	(stonith:fence_apc):	Started node1.example.com 
 apc-fencing2	(stonith:fence_apc):	Started node2.example.com 
 ipmi-fencing1	(stonith:fence_ipmilan):	Started node1.example.com 
 ipmi-fencing2	(stonith:fence_ipmilan):	Started node2.example.com 
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.