Design Guidance for RHEL High Availability Clusters - IBM z/VM Instances as Cluster Members

Updated

Contents

Overview

Applicable Environments

  • Red Hat Enterprise Linux (RHEL) 7, 8, 9 with the High Availability Add On
  • RHEL High Availability cluster running on IBM z/VM guests

Useful References and Guides

Introduction

This guide introduces administrators to Red Hat's recommendations, references, and considerations that may be useful in designing a RHEL High Availability cluster running on z/VM virtual machines.

z Systems / z/VM Environment Configuration

Red Hat support policies: See Red Hat's support policies for information on requirements for deploying IBM z Systems for usage with RHEL High Availability and RHEL Resilient Storage (if it will be used).


Configure IBM z Systems

The setup of z Systems generally does not need many special modifications to accommodate RHEL High Availability or Resilient Storage. However there are a few optional changes that will help address certain conditions or enable certain functionality in RHEL HA, that the below sections will touch on. For awareness in these early setup stages, these changes to consider are:

  • Set up z/VM SMAPI to use with power-based fencing
    • If using z/VM SMAPI power-fencing, set up a dedicated user for fencing that has IMAGE_ACTIVATE, IMAGE_DEACTIVATE, IMAGE_STATUS_QUERY, CHECK_AUTHENTICATION, and IMAGE_NAME_QUERY_DM permissions
  • Reduce the z/VM Dirmaint sleep duration from 2-minutes to 10-seconds

Again - these changes are detailed more specifically in subsequent steps of this guide.


Configuring DASD shared-storage for cluster members: If the cluster will utilize shared storage for cluster workloads, then DASD devices should be presented to the VMs for this purpose.


RHEL HIGH AVAILABILITY CLUSTER CONFIGURATION

Membership design and member-specs

There are not any particular considerations as far as specs of the members or how to lay out the members to call out that are specific to z/VM guests.

As usual, designing the cluster requires assessing the needs of the workload(s) that will run there, assessing the needs of the cluster to withstand various failure scenarios, and considering how much redundancy is required.

Keep in mind that deploying VMs of a cluster across different CPCs or "frames" would introduce conditions that should require treating that cluster as a "multi-site" deployment, and consideration should be given to how the cluster should be designed to achieve its goals in those conditions.


STONITH Recommendations: Use sbd fencing as the primary method

For example, if using Red Hat's recommended design:

# pcs stonith sbd device setup --device=/dev/dasdc1
# pcs stonith sbd enable --device=/dev/dasdc1 
# pcs property set stonith-watchdog-timeout=10
# pcs stonith create sbd fence_sbd device=/dev/dasdc1 

STONITH Recommendations: Set up fence_zvmip z/VM SMAPI power fencing as secondary method

A RHEL High Availability cluster does not strictly require two layers of fencing, but it certainly can be beneficial to have the extra redundancy in STONITH methods. fence_zvmip can be a useful backup to sbd, or can serve as the primary method if sbd is not possible or desired for some reason.


STONITH recommendation with multiple LPARs in an SSI cluster and fence_zvmip fencing

If using fence_zvmip for STONITH, and the cluster is composed of guests from multiple LPARs in an SSI cluster, then the cluster will need multiple fence_zvmip devices - one per LPAR.


STONITH recommendation with fence_zvmip z/VM SMAPI power-fencing: Decrease z/VM Dirmaint sleep duration

If using fence_zvmip for STONITH, adjusting z/VM's directory-maintenance-services sleep duration that occurs nightly around midnight can help avoid STONITH timeouts.


SBR
Category
Components
Article Type