Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster

Updated 28 May 2026

Introduction

With one of the primary responsibilities of a High Availability or Resilient Storage cluster being to provide continuous service for applications or resources, it is especially important that updates be applied in a systematic and consistent fashion to avoid any potential disruption to the availability of those critical services. This document aims to outline Red Hat's recommended practices for applying updates to the cluster software itself and to the software comprising the base RHEL operating system, libraries, and utilities.

Environment

Red Hat Enterprise Linux (RHEL) 6, 7, 8, 9, 10 with the High Availability or Resilient Storage Add On
One or more pieces of software installed on cluster nodes or remote nodes must be updated

Updating Software Packages in a RHEL High Availability and/or Resilient Storage Cluster

Updating Software Packages in a RHEL High Availability and/or Resilient Storage Cluster

Important Notes

WARNING: It is critical when performing software-update procedures for RHEL High Availability and Resilient Storage clusters to ensure that any node that will undergo updates is not an active member of the cluster before those updates are initiated. Swapping out the software that the cluster stack relies on while it is in use can lead to various problems and unexpected behaviors, including but not limited to issues that can cause complete outages of the cluster and services it is managing.
Red Hat does not support in-place upgrades or rolling-upgrades of cluster nodes, remote nodes, and bundle container images from one major release of RHEL to another except for the limited exceptions noted below. For example, there is no supported method for updating some nodes in a cluster from RHEL 6 to RHEL 7, introducing them into the cluster with existing RHEL 6 nodes to take over resources from them, and then updating the remaining RHEL 6 nodes. Upgrades in major releases of RHEL must be done by migrating services from a running cluster on the old release to another cluster running the new release.
- Upgrade of systems using High Availability add-on from RHEL 6 to RHEL 7 is unsupported.
- Upgrade of systems using High Availability add-on from RHEL 7 to RHEL 8 is unsupported .
- Upgrade of systems using High Availability add-on from RHEL 8 to RHEL 9 has limited support. For more information see: Support Policies for RHEL High Availability Clusters - Releases and Package Versions
- Upgrade of systems using High Availability add-on from RHEL 9 to RHEL 10 has limited support. For more information see: Support Policies for RHEL High Availability Clusters - Releases and Package Versions
Red Hat does not support rolling upgrades of shared storage that is exported with samba+ctdb: Does ctdb shared storage support rolling upgrades?
While in the process of performing an update, do not make any changes to your cluster configuration. For example, do not add or remove resources or constraints.
Although it is not required, when upgrading a Pacemaker cluster it is a good practice to upgrade all cluster nodes before upgrading any Pacemaker Remote nodes or podman (Docker) containers used in bundles.
Red Hat supports applying Kernel live patches on member nodes of a RHEL High Availability or Resilient Storage Cluster. Please see the Recommended Practices for using Kernel Live Patching in RHEL High Availability or Resilient Storage Clusters article for recommended practices and supported RHEL and Kernel versions.
The Red Hat Enterprise Linux (RHEL) Resilient Storage Add-On will no longer be supported starting with Red Hat Enterprise Linux 10 and any subsequent releases after RHEL 10. The RHEL Resilient Storage Add-On will continue to be supported with earlier versions of RHEL (7, 8, 9) and throughout their respective maintenance support lifecycles.

Please feel free to contact Red Hat Global Support Services for assistance in planning an update, upgrade, or migration of any kind. Proper planning and risk mitigation is key to a successful update or migration, and Red Hat's experts can assist in ensuring the process goes as smoothly as possible.

General Overview of Update Procedures

Updating packages that make up the RHEL High Availability and Resilient Storage Add-Ons, either individually or as a whole, can be done in one of two general ways:

Rolling Updates: The basic idea is to take a fully-formed and active cluster, remove one node from service by stopping its relevant services and daemons, update its software, then integrate it back into the cluster before repeating the procedure on another node. This allows for the cluster to continue providing service and managing resources while each node is updated, and allowing the update node(s) to provide service while bringing the remaining node(s) up to the same software level. The node undergoing an update at each stage should not be a member of the cluster while the update is ongoing.
Entire Cluster Update: When a cluster is able to undergo a complete outage, it can simplify update procedures greatly. Such situations allow for stopping the entire cluster, applying updates to all nodes simultaneously (or one-after-another, if preferred), and then starting the cluster back up together. One of the primary benefits of such a procedure is that there is no time when nodes should be running separate versions of the software, thereby eliminating any risk of incompatibilities or unexpected behavior due to such mismatches. This option also eliminates any complexity that might exist with repeatedly moving resources around in the cluster to accommodate each node stopping and then rejoining.

Risks and Considerations

When performing a Rolling Update, the presence of different versions of the High Availability and Resilient Storage within the same cluster introduces a risk that there may be unexpected behavior. While Red Hat does seek to eliminate any known incompatibilities between different releases within the same major release of RHEL, it also only performs limited testing of different versions of the software operating simultaneously. It is always possible that some previously unforeseen incompatibility between versions could cause unexpected behavior, so the only way to completely eliminate this risk is to use the Entire Cluster Update method.
New software versions always come with the potential for unexpected behavior, changes in functionality that may require advance preparation, or in rare cases, bugs causing that could impact the operation of the product. Red Hat strongly recommends having a test, development, or staging cluster configured identically to any production clusters, and using such a cluster to roll out any updates to first for thorough testing prior to the roll-out in production.
Performing a Rolling Update necessarily means reducing the overall capacity and redundancy within the cluster. The size of the cluster dictates whether the absence of a single node poses a significant risk, with larger clusters obviously being able to absorb more node failures before reaching the critical limit, and with smaller clusters being less capable or not capable at all of withstanding the failure of another node while one is missing. It is important that the potential for failure of additional nodes during the update procedure be considered and accounted for. If at all possible, taking a complete outage and updating the cluster entirely may be the preferred option so as to not leave the cluster operating in a state where additional failures could lead to an unexpected outage.
Updates in the pacemaker package sometimes bring a change in the pacemaker's crm_feature_set. This may introduce a risk when performing rolling updates as the cluster requires all the nodes to run the same crm_feature_set. If that happens you would need to update all the nodes before the updated ones can rejoin. We recommend checking if your update will bring a change in crm_feature_set Content from projects.clusterlabs.org is not included.here and testing the update procedure in a test or staging environment before updating production.

Video Walkthrough

Procedure for Rolling Updates

The specific steps to follow differs depending on the RHEL release and style of cluster in use.

Only run pcs commands on updated nodes whenever possible. The newer version should always be
backward-compatible, but the older version may not be forward-compatible.
Once the last active older cluster node has been taken out of the cluster, no older cluster node will be
able to rejoin without being updated first.

RHEL 6, 7, 8, 9, 10 Clusters using pacemaker

Perform the following steps to update the base RHEL packages, High Availability Add-On packages, and/or Resilient Storage Add-On packages on each node in a rolling-fashion:

Choose a single node where the software will be updated, to help reduce downtime we would recommend first migrating nodes running the smallest amount of services or a cluster node that is the passive cluster node for promotable pacemaker managed resources. If any preparations need to be made before stopping or moving the resources or software running on that node, carry out those steps now .
Before starting the procedure it is recommended that the systemd service pacemaker is disabled from starting at boot. Disabling pacemaker from starting at boot will prevent pacemaker from starting until verifying that all components managed by the cluster actually still work as expected.

The cluster stack can be disabled from starting on boot on this chosen node with:

# # Syntax: # pcs cluster disable [<node>]
# # Example:
# pcs cluster disable node1.example.com

Enabling pacemaker to start at boot should only be enabled after it has been verified that the cluster is still able to manage all the clustered managed resources without issues.

If the cluster is composed of 3 or more cluster nodes then move any managed resources off of this node as needed. If there are specific requirements or preferences for where the resources should be relocated to, then consider creating new This content is not included.location constraints to place the resources on the correct node. The location of resources can be strategically chosen to result in the least number of moves throughout the Rolling Update procedure, rather than moving resources in preparation for every single node update.

Otherwise if allowing the cluster to automatically manage placement of resources on its own is acceptable, then the next step will automatically take care of this.

Place the chosen node in This content is not included.standby mode to ensure it is not considered in service, and to cause any remaining resources to be relocated elsewhere or stopped. Before proceeding to step 5 please monitor pcs status to make sure all resources have been moved off the node actively being updated.

# # Syntax: # pcs node standby [<node>]
# # Example:
# pcs node standby node1.example.com

Stop the cluster software on the chosen node using pcs:

# # Syntax: # pcs cluster stop [<node>]
# # Example:
# pcs cluster stop node1.example.com

Perform any necessary software updates on the chosen node. There are various methods for doing so that are outside the scope of this article. Consult the general instructions for installing High Availability and Resilient Storage software, Knowledge Content in the Customer Portal, and/or the This content is not included.Product Documentation. After a cluster node has been updated it must be manually verified that all components running on the node that are managed by the cluster are still working as expected to avoid unnecessary failures to be triggered when the cluster node is a member of the cluster managing cluster resource when the cluster stack is started again on the cluster node.
If any software was updated that necessitates a reboot, prepare to perform that reboot. It is recommended that cluster software be disabled from starting on boot so that the host can be checked to ensure it is fully functional on its new software versions before bringing it into the cluster as noted in step 2.

Perform the reboot when ready, and when complete, ensure the host seems to be fully functional and is using the correct software in any relevant areas (such as having booted into the latest kernel). If anything does not seem correct, then do not proceed until the situation is resolved. Contact Red Hat Global Support Services for assistance if needed.

Rejoin the updated node into the cluster.

# # Syntax: # pcs cluster start [<node>]
# # Example:
# pcs cluster start node1.example.com

Check pcs status output to determine if everything appears as it should. Once the node seems to be functioning properly, reactivate it for service by taking it out of standby mode:

# # Syntax: # pcs node unstandby [<node>]
# # Example:
# pcs node unstandby node1.example.com

If any temporary This content is not included.location constraints were created in step 2 to control the placement of resources, then adjust or remove them to allow resources to go back to their normally preferred locations.
After verifying that all pacemaker managed cluster resources are able to run on the cluster node then enable pacemaker to start at boot.

# Syntax: # pcs cluster enable []

# Example:

pcs cluster enable node1.example.com

Repeat steps 1-8 for each remaining node.
After all the cluster nodes or remote nodes have been upgraded then run the following:

pcs cluster cib-upgrade

RHEL 6 with cman (no pacemaker)

Perform the following steps to update the base RHEL packages, High Availability Add-On packages, and/or Resilient Storage Add-On packages on each node in a rolling-fashion:

1.) Choose a single node where the software will be updated. If any preparations need to be made before stopping or moving the resources or software running on that node, carry out those steps now.

2.) Move any managed resources off of this node as needed. If there are specific requirements or preferences for where resources should be relocated to, then consider moving them with This content is not included.clusvcadm, the This content is not included.Conga web administration interface, or ccs to place the resources on the correct node. Otherwise if allowing the cluster to manage placement of resources on its own is acceptable, then the next step will automatically take care of this.

3.) Stop all running cluster daemons on this chosen node. In RHEL 6 this can be done easily using ccs:

# # Syntax: # ccs -h <hostname> --stop
# # Example:
# ccs -h node1.example.com --stop

NOTE: This also disables the cluster daemons from starting on boot.

4.) Perform any necessary software updates. There are various methods for doing so that are outside the scope of this article. Consult the general instructions for installing High Availability and Resilient Storage software, Knowledge Content in the Customer Portal, and/or the This content is not included.Product Documentation.

5.) If any software was updated that necessitates a reboot, prepare to perform that reboot. It is recommended that cluster software be disabled from starting on boot so that the host can be checked to ensure it is fully functional on its new software versions before bringing it into the cluster. The cluster daemons can be disabled from starting on boot on this chosen node with chkconfig <service> off, or using ccs as seen in step 3 above.

Once everything appears to be set up correctly, re-enable the cluster daemons on this node using chkconfig, or with ccs as described in step 6 below.

6.) Rejoin the updated node into the cluster by starting the cluster daemons, or by using ccs in RHEL 6:

# # Syntax: # ccs -h <hostname> --start
# # Example:
# ccs -h node1.example.com --start

NOTE: This enables cluster daemons to start automatically on boot.

Once all daemons are started, the node should be fully functional and managing resources within the cluster.

7.) Repeat steps 1-6 for each remaining node.

Procedure for Entire Cluster Update

The process for updating an entire cluster at once is nearly identical to the Rolling Update procedure above, with the single difference being that each step should be performed on all nodes before moving on to the next step. So, for example, stop the cluster daemons on each node before moving on to updating the software, and reboot each node before moving on to re-enabling the cluster software, etc. In the end, the goal is to stop the cluster software on all nodes, update those nodes, then start the cluster software again. The above steps can be used as a guide, and may even be simplified to skip some of the preparation steps if they are not required.

SBR