Administrative Procedures for RHEL High Availability clusters - Enabling sbd fencing in RHEL 7 and higher
Contents
- Overview
- Prepare the environment
- Enable and configure
sbdhealth and quorum monitoring - Configure cluster fence methods
Overview
Applicable Environments
- Red Hat Enterprise Linux (RHEL) 7 or higher with the High Availability Add-On
Recommended Prior Reading
Useful References and Guides
- Exploring components:
sbdandfence_sbd - Design guidance:
sbdfencing - This content is not included.Deployment examples for RHEL High Availability clusters
Introduction
This article focuses on deploying sbd fencing in a RHEL High Availability cluster.
Red Hat's series of Administrative Procedures for RHEL High Availability Clusters aims to deliver simple and direct instruction for configuring common use cases using typical or recommended settings. There may be additional features, options, and decisions relevant to the targeted components that organizations may wish to consider for their deployments, whereas this guide can serve as a helpful reference for understanding how to get up and running with typical settings quickly. The above references and guides may be a useful starting point to understand these components and the High Availability technologies in more detail.
NOTE: This guide will present optional steps for enabling sbd poison-pill fencing via block-device. Before starting, decide whether this feature will be enabled. Red Hat recommends including this feature for full fencing-coverage. If not enabling this feature: skip those steps.
Prepare the environment
Prerequisites for enabling sbd fencing
- RHEL 7 or 8+ server cluster members have operating system and cluster software installed
- Cluster has been setup (e.g.,
pcs cluster setup)
Install base software
On every node:
# yum install sbd
Optional: If enabling sbd poison-pill fencing via block-device
# yum install fence-agents-sbd
Confirm a suitable watchdog device
Each node must have access to a watchdog timer device that meets Red Hat's support requirements and that will perform the necessary reset functions. This is a good time to stop and confirm that each node's watchdog timer device meets those requirements and performs as expected in testing.
- See also: Support policies -
sbdandfence_sbd - See also: Diagnostic procedure - Validating a Watchdog Timer Device (WDT) for Usage with sbd
Optional: Present and confirm shared storage device(s) for sbd poison-pill fencing via block-device
To use sbd poison-pill fencing via block-device, each node must have access to 1-3 shared devices that will be dedicated to usage by sbd and fence_sbd. Requirements for this device and guidance on choosing the storage layout can be found in the following:
Present the block device(s) to each node in the cluster, then confirm that each node has visibility with tools like fdisk, multipath, or similar.
Note: Even when using poison-pill, fence_sbd requires a supported watchdog device be configured
Perform any necessary preparation on the device, such as mapping it in device-mapper-multipath or another multipathing technology, if applicable.
Identify the path to the device as it is seen by each node. Note that depending on the way in which the device is configured and accessed, it may be known by a different path on each system. For example, what is /dev/sdc on one node may be /dev/sdd on another node; likewise with device-mapper-multipath a device could be /dev/mapper/mpathb and /dev/mapper/mpathc on different nodes. It is important to identify one path name that represents the same block device on every node. Here are a few tips to achieve this:
/dev/disk/contains several directories with persistently-named symlinks to block devices. These can be useful for referencing a device by some property, like its ID (aka WWID or SCSI ID), or its "path" (in reference to how the device is presented to the system)./dev/disk/by-id/scsi-<ID>and/dev/disk/by-id/dm-uuid-mpath-<ID>are common choices for referencing a device by a consistent name across systems.device-mapper-multipathdevices can be configured to be named by UUID instead of by "user friendly names". See/etc/multipath.confand its manpage for details on "user_friendly_names".device-mapper-multipathdevices can have an alias configured in/etc/multipath.conf, allowing administrators to set the name consistently on each node.
Optional: Initialize shared storage device(s) for sbd poison-pill fencing via block-device
This step is required for sbd poison-pill fencing via block-device, but not required if that feature will be omitted.
WARNING: This will destroy data on the specified device(s) in preparation for usage with sbd. Only proceed if the specified device(s) contain(s) no important data that must be preserved.
Specify each of the chosen block devices in the pcs stonith sbd device setup command (specification of your watchdog device will come later):
RHEL 7
# # Syntax: # pcs stonith sbd device setup --device=<device path> [--device=<device path>] [--device=<device path>]
#
# # Example with a single device:
# pcs stonith sbd device setup --device=/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413ace481447
#
# # Example with multiple devices:
# pcs stonith sbd device setup --device=/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413ace481447 --device=/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413def1258 --device=/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413ber1548
RHEL 8 and higher
# # Syntax: # pcs stonith sbd device setup device=<device path> [device=<device path>] [device=<device path>]
Enable and configure sbd health and quorum monitoring
Option 1: With sbd poison-pill fencing via block-device
NOTE: This will restart the cluster entirely to enable sbd on all nodes. Only proceed if the cluster is in a state where nodes and resources are able to undergo an outage.
NOTE: This procedure will enable corosync's auto_tie_breaker feature for quorum arbitration if all of the following are true:
- The cluster has an even number of nodes
- The cluster does not yet have
auto_tie_breakerenabled - The cluster does not use
qdevice
If there is a desire to have more control over the quorum arbitration configuration instead of simply enabling auto_tie_breaker, stop and perform that configuration now before proceeding to enable sbd.
RHEL 7
# # Syntax: # pcs stonith sbd enable --device=<device> [--device=<device>] [--device=<device>] [--watchdog=<path>[@<node>]] ... [<SBD_OPTION>=<value>]
To use the typical defaults, just specify the shared device/devices initialized earlier:
# # Example with a single device:
# pcs stonith sbd enable --device=/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413ace481447 [--watchdog=<path>[@<node>]] ... [<SBD_OPTION>=<value>]
#
# # Example with multiple devices:
# pcs stonith sbd enable --device=/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413ace481447 --device=/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413def1258 --device=/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413ber1548 [--watchdog=<path>[@<node>]] ... [<SBD_OPTION>=<value>]
RHEL 8 and higher
# # Syntax: # pcs stonith sbd enable device=<device> [device=<device>] [device=<device>] [watchdog=<path>[@<node>]] ... [<SBD_OPTION>=<value>]
See pcs stonith help and Red Hat's design guidance on sbd for further guidance on enabling sbd with various options that are available.
Option 2: Without sbd poison-pill fencing via block-device
NOTE: This will restart the cluster entirely to enable sbd on all nodes. Only proceed if the cluster is in a state where nodes and resources are able to undergo an outage.
NOTE: This procedure will enable corosync's auto_tie_breaker feature for quorum arbitration if all of the following are true:
- The cluster has an even number of nodes
- The cluster does not yet have
auto_tie_breakerenabled - The cluster does not use
qdevice
If there is a desire to have more control over the quorum arbitration configuration instead of simply enabling auto_tie_breaker, stop and perform that configuration now before proceeding to enable sbd.
Enable sbd health and quorum monitoring with:
RHEL 7
# # Syntax: # pcs stonith sbd enable [--watchdog=<path>[@<node>]] ... [<SBD_OPTION>=<value>]
# # Example:
# pcs stonith sbd enable
RHEL 8 and higher
# # Syntax: # pcs stonith sbd enable [watchdog=<path>[@<node>]] ... [<SBD_OPTION>=<value>]
# # Example:
# pcs stonith sbd enable
See pcs stonith help and Red Hat's design guidance on sbd for further guidance on enabling sbd with various options that are available.
Configure cluster fence methods
Enable stonith-watchdog-timeout fencing
Configure the cluster with a greater than 0 value for the cluster property stonith-watchdog-timeout. This value should be larger than the SBD_WATCHDOG_TIMEOUT setting configured in the earlier pcs stonith sbd enable step - which defaults to a value of 5 seconds. If SBD_WATCHDOG_TIMEOUT was left at the default, this stonith-watchdog-timeout property can be set to 10s.
# # Example with SBD_WATCHDOG_TIMEOUT=10 (seconds)
# pcs property set stonith-watchdog-timeout=10
For more information on choosing this value, see Red Hat's design guidance for sbd fencing:
Optional: Create fence_sbd STONITH device for sbd poison-pill fencing via block-device
# # Syntax: # pcs stonith create <name> <agent> [<agent attributes>] [<stonith attributes>]
# # # pcs stonith create <name> fence_sbd devices=<list of device paths, comma-separated>
From one node in the cluster, create the STONITH device. Give the paths to any devices that were chosen for usage with sbd and initialized in the earlier step.
# # Example with single device:
# pcs stonith create sbd fence_sbd devices=/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413ace481447
#
# # Example with multiple devices:
# pcs stonith create sbd fence_sbd devices=/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413ace481447,/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413def1258,/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413ber1548
Optional: Configure fence_sbd device into STONITH levels with any other devices in the cluster
NOTE: This step is only needed if there are to be additional STONITH devices corresponding to any node, beyond the fence_sbd device. The levels - or "topology" - instruct the cluster what order to execute the devices in.
Typically fence_sbd should come after any diagnostic methods - like fence_kdump. fence_sbd can either be before or after other methods like power or storage fencing, depending on preference and requirements. If using fence_kdump then do what is instructed in this article: sbd watchdog timeout causes node to reboot during crash kernel execution in addition to setting up the fencing levels.
For example, combining fence_sbd with fence_kdump (both of which have already been created in this example):
# pcs stonith show --full
Resource: kdump (class=stonith type=fence_kdump)
Attributes: pcmk_host_list="rhel7-node1.example.com rhel7-node2.example.com rhel7-node3.example.com rhel7-node4.example.com"
Operations: monitor interval=60s (kdump-monitor-interval-60s)
Resource: sbd (class=stonith type=fence_sbd)
Attributes: devices=/dev/disk/by-id/dm-uuid-mpath-360014054540864241b04b67af8351a38,/dev/disk/by-id/dm-uuid-mpath-360014058b858513c3044413ace481447,/dev/disk/by-id/dm-uuid-mpath-360014059b3b71fdd6254d519c085050c
Operations: monitor interval=60s (sbd-monitor-interval-60s)
# pcs stonith level add 1 rhel7-node1.example.com kdump
# pcs stonith level add 2 rhel7-node2.example.com sbd
# # [... repeat for all nodes ...]