What are the `on-fail` options that can be configured for cluster resources
Environment
- Red Hat Enterprise Linux Server 6, 7, 8, 9 (with the High Availability Add On)
Issue
- How to set the pacemaker resource
on-failoptions? - What's the meaning these options?
- What are the
on-failoptions that can be configured for cluster resources
Resolution
There are many values can be configured for on-fail option. These values are as below:
ignore: Pretend the resource did not failblock: Do not perform any further operations on the resourcestop: Stop the resource and do not start it elsewhererestart: Stop the resource and start it again (possibly on a different node)fence: STONITH the node on which the resource failedstandby: Move all resources away from the node on which the resource failed
Modify, Add, or Remove resource operation actions
Update an existing resource operation action.
# pcs resource update <resource_id> op <operation_action> interval=0s timeout=40s
Add an operation action to an existing resource with the following command.
# pcs resource op add <resource_id> <operation_action> [operation_properties]
Delete an existing operation action configured on a resource.
# pcs resource op remove <resource_id> <operation_name> [operation_properties]
Example
To have resource fence the cluster node when a monitor operation fails by setting `on-fail=fence` on a monitor operation for a `Dummy` resource.
# pcs resource create dummy1 ocf:heartbeat:Dummy op monitor interval=30s timeout=10s on-fail=fence
Then to test that the cluster node is fenced when the resource's monitor fails then remove the "state" file on the node that the resource is running on.
# rm /var/run/resource-agents/Dummy-dummy1.state
Do note the following related to resource operation actions
- The default for the stop operation is to
fencewhen STONITH is enabled and block otherwise. All other operations default to restart. - Do not set
on-fail=stopto a resourcestopoperation. The cluster won't report an error but this is not a valid setting. As a consequence, the cluster will ignore the on-fail parameter, acting by default and reporting messages like the following in the log files.
Feb 18 16:50:00 nodeb pacemaker-schedulerd[1244]: error: Resetting 'on-fail' for my_resource stop action to default value because 'stop' is not allowed for stop
For more information:
- Table 6.4. Properties of an Operation | Red Hat Enterprise Linux 7
- 6.6.2. Configuring Global Resource Operation Defaults | Red Hat Enterprise Linux 7
- Table 20.1. Properties of an Operation | Red Hat Enterprise Linux 8
- 20.1. Configuring resource monitoring operations | Red Hat Enterprise Linux 8
SBR
Product(s)
Components
Category
Tags
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.