Administrative Procedures for RHEL High Availability Clusters - Stopping a RHEL 7 Cluster Node

Updated

Contents

Overview

Applicable Environments

  • Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On

Situations Where this Procedure May Be Useful

  • Maintenance is to be performed on one or a subset of cluster nodes and access to shared resources is not required
  • Activity will be occur that may disrupt a cluster member's ability to communicate with other members
  • Unexpected behavior is occurring on a node within a cluster and there is a desire to reboot or restart it to get back to a "clean" state
  • Rolling software updates will be applied to the cluster nodes one at a time

What This Procedure Accomplishes

A cluster node is considered "stopped" when it is no longer an active member of the cluster and its core cluster daemons provided by pacemaker are not running.

This procedure is thus aimed at taking a cluster node out of a cluster membership temporarily so that it ends its participation in any resource management or recovery, fencing activity, member monitoring, or other operations carried out by the cluster. This will remain the case until the cluster is started again on that node, or until it is rebooted with the pacemaker service enabled to start on boot.

Procedure: Stopping the Cluster on a Node

Consideration: Hard Stop vs Graceful Exit

The circumstances surrounding this cluster stop operation may sometimes dictate that it is better to have the node in focus be forcefully removed from the cluster, rather than having the node leave the cluster in a clean manner. If there are ongoing problems in the cluster, then attempting to stop cleanly may just result in failed operations, waiting indefinitely on operations to complete, membership transitions, fencing, or other unwanted behavior. Rebooting the node hard or otherwise exiting the cluster without taking graceful-exit steps first may result in this node being fenced, but that result may be more appropriate for achieving a functional cluster again quickly.

If any of these conditions are present, it may be better to try to remove the node forcefully through a power cycle or hard reboot rather than the relocate/standby/stop procedure detailed here:

  • Nodes seem to be unable to communicate with each other upon cluster start and are becoming blocked or otherwise unusable shortly after.
  • Resilient Storage - gfs2, lvm2-cluster (clvmd), cmirror - is in use and the cluster is not currently stable in its membership, or other unexpected behavior is occurring with any of those components.
  • Cluster-initiated operations or cluster-related commands are not returning promptly.

In these situations, consider using this alternate diagnostic procedure to hard stop a node.


Optional Task: Relocate Resources to Chosen Nodes

Subsequent steps of "Put Node Into Standby Mode" or "Stop pacemaker Service" will automatically move resources to another available node, so action by the administrator is not strictly necessary to relocate resources. However, sometimes it may be preferable to choose where resources will go, rather than rely on the cluster to decide where to place them.

If there is a preference for any resources to run in a specific location, take action to move the resources there now.


Optional Task: Put Node in Standby Mode

It is not necessary to "standby" a node before exiting, but it can be useful in several ways, including the following:

  • Keep node ready to return to service while resources move: The cluster will move resources to alternate locations, and in standby mode this node can remain ready to return to active service in case something unexpected happens during that movement. Once it is clear that everything is stable without this node managing resources, the node can be fully stopped.
  • Prevent node from returning to service immediately upon next start: Standby mode carries over through multiple stops/starts until a node is explicitly "unstandby'd". So, by entering standby mode before leaving, this ensures that on its next start it will not immediately return to service until an administrator decides it is ready This can be useful when changes will be applied to the node that create some uncertainty about whether it will function properly, and there is a desire to reintegrate it into the cluster step-by-step to avoid unnecessary service disruption.

If any of these benefits are desirable, then it may be best to put the node in standby mode first. Just remember to take the node out of standby mode when it is time to return to service.


Task: Stop pacemaker Service

From the node that is to be stopped, execute the following command:

# pcs cluster stop

Alternatively, the node can be specified when executing the command from any node in the cluster:

# # Syntax: # pcs cluster stop [node]
# pcs cluster stop node1.example.com

If all nodes should be stopped, then specify the --all flag:

# pcs cluster stop --all

Article Type