When to use "maintenance-mode" in RHEL High Availability Add-on for pacemaker based cluster?
Environment
- Red Hat Enterprise Linux (RHEL) 6, 7, 8 or 9 with the High Availability Add -On
- Pacemaker
Issue
- Need to know recommended use of
maintenance-modein RHEL High Availability Add-on cluster in RHEL 7, 8 , 9. - Can I use maintenance-mode=true while performing patching ?
- Can I set cluster to maintenance-mode while patching SAP Hana Environments?
Resolution
Maintenance mode (for a pacemaker cluster) is a property that can be enabled when you want to safely perform operations that could otherwise trigger resource migration or fencing. Once the maintenance activity is completed, the property can be unset.
Examples of when to use maintenance mode :
- Need to perform disk maintenance on node1, which could temporarily cause the application resource to stop or fail.
- Changing a resource's configuration (e.g., IP address, port) and want to test it without Pacemaker intervening.
What happens when you enable maintenance-mode in a pacemaker environment ?
- Pacemaker stops monitoring resources, so failures won’t trigger recovery
- No start, stop, promote, or demote actions will be taken by the cluster
- Fencing (STONITH) will not trigger due to resource failures*
When should you NOT use maintenance-mode :
- When patching Cluster servers
- When upgrading packages on Cluster servers
- When performing security patching on Cluster servers
- When upgrading SAP Environments
- When attempting to perform unsupported live migration or snapshots of VMs with running cluster stack
- During production failures or real outages
- On a shared storage node during I/O issues
- For routine changes without justification
- To prevent fencing
- For prolonged periods without justification
*Please NOTE that maintenance mode serves as a mechanism to prevent only resource-related events from causing fencing, it however does NOT prevent node-level events - such as network loss, node hung/panic, ungraceful reboot/shutdown and others - from being resolved by fencing.
To set maintenance mode on the cluster:
# pcs property set maintenance-mode=true
To unset maintenance mode on the cluster:
# pcs property unset maintenance-mode
If only one particular resource by pacemaker is affected, then in some cases setting the resource to unmanaged might be an option.
Root Cause
When the cluster is in maintenance mode, pacemaker will stop managing the resource and stop all the monitors on the running resources. This means that you can manually stop and start resources without causing pacemaker to take action.
Below is a snippet from a man page for reference:
Maintenance Mode tells the cluster to go to a `hands off` mode, and not start or stop any services until told otherwise. When maintenance-mode is completed, the cluster does a sanity check of the current state of any services, and then stops or starts any that need it.
If you modify a resource while in maintenance mode then once you leave maintenance mode then the resource will have to be reloaded and this could cause any constraints or dependencies (if part of a group) to be stopped first.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.