Design Guidance for RHEL High Availability Clusters - IBM z/VM Instances as Cluster Members
Contents
Overview
Applicable Environments
- Red Hat Enterprise Linux (RHEL) 7, 8, 9 with the High Availability Add On
- RHEL High Availability cluster running on IBM z/VM guests
Recommended Prior Reading
Useful References and Guides
Introduction
This guide introduces administrators to Red Hat's recommendations, references, and considerations that may be useful in designing a RHEL High Availability cluster running on z/VM virtual machines.
z Systems / z/VM Environment Configuration
Red Hat support policies: See Red Hat's support policies for information on requirements for deploying IBM z Systems for usage with RHEL High Availability and RHEL Resilient Storage (if it will be used).
- Support Policies for RHEL High Availability Clusters - IBM z Systems as a cluster platform
- Support Policies for RHEL Resilient Storage Clusters - Resilient Storage on IBM z Systems
Configure IBM z Systems
The setup of z Systems generally does not need many special modifications to accommodate RHEL High Availability or Resilient Storage. However there are a few optional changes that will help address certain conditions or enable certain functionality in RHEL HA, that the below sections will touch on. For awareness in these early setup stages, these changes to consider are:
- Set up z/VM SMAPI to use with power-based fencing
- If using z/VM SMAPI power-fencing, set up a dedicated user for fencing that has
IMAGE_ACTIVATE,IMAGE_DEACTIVATE,IMAGE_STATUS_QUERY,CHECK_AUTHENTICATION, andIMAGE_NAME_QUERY_DMpermissions
- If using z/VM SMAPI power-fencing, set up a dedicated user for fencing that has
- Reduce the z/VM Dirmaint sleep duration from 2-minutes to 10-seconds
Again - these changes are detailed more specifically in subsequent steps of this guide.
Configuring DASD shared-storage for cluster members: If the cluster will utilize shared storage for cluster workloads, then DASD devices should be presented to the VMs for this purpose.
RHEL HIGH AVAILABILITY CLUSTER CONFIGURATION
Membership design and member-specs
There are not any particular considerations as far as specs of the members or how to lay out the members to call out that are specific to z/VM guests.
As usual, designing the cluster requires assessing the needs of the workload(s) that will run there, assessing the needs of the cluster to withstand various failure scenarios, and considering how much redundancy is required.
Keep in mind that deploying VMs of a cluster across different CPCs or "frames" would introduce conditions that should require treating that cluster as a "multi-site" deployment, and consideration should be given to how the cluster should be designed to achieve its goals in those conditions.
STONITH Recommendations: Use sbd fencing as the primary method
- For detailed instructions, see: Administrative procedure - Enabling
sbdfencing in RHEL 7
For example, if using Red Hat's recommended design:
# pcs stonith sbd device setup --device=/dev/dasdc1
# pcs stonith sbd enable --device=/dev/dasdc1
# pcs property set stonith-watchdog-timeout=10
# pcs stonith create sbd fence_sbd device=/dev/dasdc1
STONITH Recommendations: Set up fence_zvmip z/VM SMAPI power fencing as secondary method
A RHEL High Availability cluster does not strictly require two layers of fencing, but it certainly can be beneficial to have the extra redundancy in STONITH methods. fence_zvmip can be a useful backup to sbd, or can serve as the primary method if sbd is not possible or desired for some reason.
- For detailed instructions, see: Administrative procedure - Configuring z/VM SMAPI Fencing with
fence_zvmipfor RHEL 7 IBM z Systems Cluster Members
STONITH recommendation with multiple LPARs in an SSI cluster and fence_zvmip fencing
If using fence_zvmip for STONITH, and the cluster is composed of guests from multiple LPARs in an SSI cluster, then the cluster will need multiple fence_zvmip devices - one per LPAR.
STONITH recommendation with fence_zvmip z/VM SMAPI power-fencing: Decrease z/VM Dirmaint sleep duration
If using fence_zvmip for STONITH, adjusting z/VM's directory-maintenance-services sleep duration that occurs nightly around midnight can help avoid STONITH timeouts.