Using SCSI Persistent Reservations with Red Hat Enterprise Linux 4 or 5
1 Introduction
When cluster nodes share storage devices, it is necessary to control access to the storage devices. In the event of a node failure, the failed node should not have access to the underlying storage devices. SCSI persistent reservations provide the capability to control the access of each node to shared storage devices. Red Hat Enterprise Linux 5 Advanced Platform employs SCSI persistent reservations as a fencing methods through the use of the fence_scsi agent. The fence_scsi agent provides a method to revoke access to shared storage devices, provided that the storage support SCSI persistent reservations.
Using SCSI reservations as a fencing method is quite different from traditional power fencing methods. It is very important to understand the software, hardware, and configuration requirements prior to using SCSI persistent reservations as a fencing method.
This document describes the use of SCSI persistent reservations for Red Hat Clustering 4 and 5. Information about using SCSI persistent reservations with newer version of Red Hat Clustering can be found here: Using SCSI Persistent Reservations with Red Hat Enterprise Linux 6
2 Overview
In order to understand how Red Hat Enterprise Linux 5 Advanced Platform is able to use SCSI persistent reservations as a fencing method, it is helpful to have some basic knowledge of SCSI persistent reservations.
There are two important concepts within SCSI persistent reservations that should be made clear: registrations and reservations.
2.1 Registrations
A registration occurs when a node registers a unique key with a device. A device can have many registrations. For our purposes, each node will create a registration on each device.
2.2 Reservations
A reservation dictates how a device can be accessed. In contrast to registrations, there can be only one reservation on a device at any time. The node that holds the reservation is know as the "reservation holder". The reservation defines how other nodes may access the device. For example, fence_scsi uses a "Write Exclusive, Registrants Only" reservation. This type of reservation indicates that only nodes that have registered with that device may write to the device.
2.3 Fencing
The fence_scsi agent is able to perform fencing via SCSI persistent reservations by simply removing a node's registration key from all devices. When a node failure occurs, the fence_scsi agent will remove the failed node's key from all devices, thus preventing it from being able to write to those devices.
3 Requirements
3.1 Software Requirements
In order to use SCSI persistent reservations as a fencing methods, several requirements must be met.
- Red Hat Cluster Suite 4.5 or greater
- Red Hat Enterprise Linux 5.0 Advanced Platform or greater
The sg3_utils package must also be installed. This package provides the tools needed by the various scripts to manage SCSI persistent reservations.
3.2 Storage Requirements
In order to use SCSI persistent reservations as a fencing method, all shared storage must use LVM2 cluster volumes. In addition, all devices within these volumes must be SPC-3 compliant. SCSI-2 devices are not supported. If you are unsure if your cluster and shared storage environment meets these requirements, a script is available to determine if your shared storage devices are capable of using SCSI persistent reservations. See section 5.1.
4 Limitations
In addition to these requirements, fencing by way of SCSI persistent reservations also some limitations.
- Multipath devices are currently only supported for Red Hat Enterprise Linux 5.0 Advanced Platform and later with the use of device-mapper-multipath.
- Use with HA-LVM is not supported
- HA-LVM provides a way of dynamically mounting LVM volume groups (VGs) as a resource needs them, while clvmd provides a mechanism for being able to simultaneously mount a VG across all nodes. Therefore using HA-LVM in conjunction with clvmd does not make sense, since if you have clvmd you don't need HA-LVM.
- Because fence_scsi in RHEL5 requires clvmd to determine which block devices to register against, this precludes usage of HA-LVM
- In a future version of RHEL the clvm requirement may be lifted from fence_scsi. When this happens fence_scsi and HA-LVM will no longer be mutually exclusive.
- All nodes in the cluster must have a consistent view of storage. In other words, all nodes in the cluster must register with the same devices. This limitation exists for the simple reason that each node must be able to remove another node's registration key from all the devices that it registered with. In order to do this, the node performing the fencing operation must be aware of all devices that other nodes are registered with. If all cluster nodes have a consistent view of storage, this requirement is met.
- Devices used for the cluster volumes should be a complete LUN, not partitions. SCSI persistent reservations work on an entire LUN, meaning that access is controlled to each LUN, not individual partitions.
- The fence_scsi agent can be used with 2-node clusters only in Red Hat Enterprise Linux 5.4.z and later.
- The fence_scsi agent can be used with qdiskd only in Red Hat Enterprise Linux 5.4.z and later. The quorum disk must be separate from CLVM LUNs (see qdisk man page section 1.4 "Limitations" for more information).
5 Components
Red Hat Enterprise Linux 4.5+ Cluster Suite and Red Hat Enterprise Linux 5+ Advanced Platform provide three components (scripts) to be used in conjunction with SCSI persistent reservations. The fence_scsi_test script provides a means to discover and test devices and report whether or not they are capable of SCSI persistent reservations. The scsi_reserve init script, if enabled, will run at node startup and discover shared storage devices and create registrations/reservations on each device using the node's unique key. The fence_scsi script, if configured as the fencing method, will remove a failed node's registration key from all known devices.
5.1 The fence_scsi_test script
To assist with finding and detecting devices which are (or are not) suitable for use with fence_scsi, a tool has been provided. The fence_scsi_test script will find devices visible to the node and report whether or not they are compatible with SCSI persistent reservations.
The fence_scsi_test script (Red Hat Enterprise Linux 5.5 and Earlier)
The fence_scsi_test script will find all devices visible to a node and report on whether those devices are
compatible with SCSI persistent reservations. In Red Hat Enterprise Linux 5.5 and earlier, the
fence_scsi_test script only tests if a node can create a registration with the devices. This does not
completely test if storage is capable of being used with the fence_scsi agent. Specifically,
fence_scsi_test does not check for support of the preempt-and-abort sub-command, which is
required.
Two modes are available, and you must explicitly state which mode to use by using the appropriate
command-line option:
-
Cluster Mode: Specified with the -c flag, this mode is intended for use with an existing cluster environment. This mode will discover all LVM2 cluster volumes and extract the devices within those volumes. Only devices that exist within LVM2 cluster volumes will be tested.
-
SCSI Mode: Specified with the -s flag, this mode is intended to test all SCSI devices visible to the node. This is useful when planning the cluster volume configuration. This mode will test all SCSI devices found in the /sys/block/ directory, which may include local SCSI devices.
In both modes, the script will test found devices for compatibility by attempting to register with the devices. Successful registration indicates that the device is capable of performing SCSI persistent reservations. If registration is successful, the script will remove the registration.
NOTE: If fence_scsi_test is run in Cluster Mode and reports devices that have failed the test, you must not use fence_scsi as your fencing method. If fence_scsi_test is run in SCSI Mode and reports failures for devices, those devices must not be used for shared storage (LVM2 cluster volumes) if you wish to use fence_scsi as a fencing method.
The fence_scsi_test script (Red Hat Enterprise Linux 5.6 and Later)
In Red Hat Enterprise Linux 5.6 and later, the fence_scsi_test script is capable of performing more thorough testing. Specifically, you can use the fence_scsi_test script to create registrations and reservations, as well as to test the preempt-and-abort sub-command. You can test without using LVM2 cluster volumes, but they are still required for use with fence_scsi itself. In addition, testing with the fence_scsi_test script requires at least two nodes to be connected to shared storage.
To create registrations, run the following command from a node (node #1) that is connected to shared storage:
% fence_scsi_test -o on -k 123 -d /dev/sdb,/dev/sdc
In this example, the -k option specifies the key value to be used. Any key value may be used as long as it is not already being used by a different node. The -d option specifies the devices on which to create registrations (you can specify a comma-separated list of devices). If the -d option is not specified, the fence_scsi_test script will attempt to create registrations with all LVM2 cluster volumes.
On a different node (node #2) that is attached to the same shared storage devices, run another command to register the same devices:
% fence_scsi_test -o on -k 456 -d /dev/sdb,/dev/sdc
To remove registrations, as would be done when fencing occurs via the fence_scsi agent, run the following command from a node (node #2):
% fence_scsi_test -o off -k 123 -d /dev/sdb,/dev/sdc
In this example, the -k option specifies the key that we want to remove. Note that this command is run on node #2 and is attempting to remove the key used by node #1. This simulates fencing via the fence_scsi agent. Again, the -d option specifies the devices on which to perform this operation. If this command is successful, the key used to register node #1 (123) should be removed from devices /dev/sda and /dev/sdb.
When testing is complete, you can clear all registrations by using clear:
% fence_scsi_test -o clear -d /dev/sdb,/dev/sdc
Note that this command must be run from a node that is still registered with the devices. A node cannot clear registrations from a device if it is not registered with that device. In this example, only node #2 is registered with the devices, so this command must be run from node #2.
See the fence_scsi_test man page for more information.
5.2 The scsi_reserve init script
Once you have verified that your cluster storage is compatible and meets the requirements necessary to use fence_scsi, you can enable the scsi_reserve init script. This can be done with the following command:
% chkconfig scsi_reserve on
When enabled, the scsi_reserve script handles creation of registrations and reservations at system startup.
NOTE: scsi_reserve should always be enabled to start on boot if cman is enabled to start and fence_scsi is the configured agent for any device in the cluster.
The scsi_reserve init script will first generate the node's unique key. This key is based on the cluster ID and the node ID, thus it is guaranteed to be unique. The next step in the scsi_reserve script depends on which parameter was used. The following options are allowed: start, stop, and status. Each case requires that the cluster manager (cman) be running. This is needed to extract information about the cluster and the individual node.
% scsi_reserve start
Running the scsi_reserve init script with the 'start' option will proceed to create registrations on all devices that were previously discovered. If necessary, it will also create the reservation. The script will report success or failure. Success indicates that the node was capable of registering with all devices that were discovered. Failure indicates that the script was unable to register with one or more device. Should a failure occur, the cluster has no way of completely fencing a node in the event of a node failure.
It is important to note that 'scsi_reserve start' should be run before mounting the file system. The reason for this is that if you already have a file system mounted and then create a reservation on any of the devices used by that file system, any node that is not registered with those devices will be unable to write to the file system.
% scsi_reserve stop
When scsi_reserve is run with the 'stop' command, it will attempt remove the node's registration key from all devices that it registered with at startup. Removing the registration is only a problem if that node is also the reservation holder and other node's are still registered with the device(s). In this case, the node will not be able to unregister since doing so would also release the reservation. Note that the script will report failure when attempting to remove a node's registration if it is the reservation holder and other registrations exist.
% scsi_reserve status
When the scsi_reserve script is run with the 'status' command, it will list the devices that the node is registered with.
5.3 The fence_scsi agent
The fence_scsi script is the actual fence agent that is run when node failure occurs. Typically this script will not be run manually, but rather invoked by fence domain. Using this script manually will remove a node's registrations from all devices, but will not remove the node from the cluster.
When a node is fenced using fence_scsi, it simply removes the specified node's registrations from all devices. This prevents write access to those devices. In the special case where the node being fenced is also the reservation holder, the node that is performing the fence operation will become the new reservation holder.
Note that if the node being fenced has the file system mounted, removing its registrations prevents the node from accessing the file system. This sudden inability to access the devices upon which the file system exists may result in I/O errors and a subsequent withdraw from the file system. This behavior is expected.
6 Configuration
Below is a sample configuration (cluster.conf) for a cluster that uses SCSI persistent reservations as its fence method. Note that each node defines its fence device and passes its node name to the agent via the "node" attribute.
Also note that each node explicitly defines its "nodeid". This is required for all clusters that use fence_scsi as the fence method. The "nodeid" attribute must be defined so that the various SCSI reservation scripts can predictably generate the node's unique registration key.
<?xml version="1.0"?>
<cluster config_version="1" name="my_cluster">
<fence_daemon post_fail_delay="0" post_join_delay="30"/>
<clusternodes>
<clusternode name="node-01" votes="1" nodeid="1">
<fence>
<method name="scsi">
<device name="fence_dev" node="node-01"/>
</method>
</fence>
</clusternode>
<clusternode name="node-02" votes="1" nodeid="2">
<fence>
<method name="scsi">
<device name="fence_dev" node="node-02"/>
</method>
</fence>
</clusternode>
<clusternode name="node-03" votes="1" nodeid="3">
<fence>
<method name="scsi">
<device name="fence_dev" node="node-03"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman cluster_id="1234"/>
<fencedevices>
<fencedevice agent="fence_scsi" name="fence_dev"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
7 References
More information on SCSI Reservations:
- Using SCSI Persistent Reservations with Red Hat Enterprise Linux 6
- How can I diagnose the cause of scsi reservation conflicts in a RHEL cluster using fence_scsi?
- How can I view, create, and remove SCSI reservations and keys?