What are the `on-fail` options that can be configured for cluster resources

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux Server 6, 7, 8, 9 (with the High Availability Add On)

Issue

  • How to set the pacemaker resource on-fail options?
  • What's the meaning these options?
  • What are the on-fail options that can be configured for cluster resources

Resolution

There are many values can be configured for on-fail option. These values are as below:

  • ignore: Pretend the resource did not fail
  • block: Do not perform any further operations on the resource
  • stop: Stop the resource and do not start it elsewhere
  • restart: Stop the resource and start it again (possibly on a different node)
  • fence: STONITH the node on which the resource failed
  • standby: Move all resources away from the node on which the resource failed
Modify, Add, or Remove resource operation actions

Update an existing resource operation action.
# pcs resource update  <resource_id> op <operation_action> interval=0s timeout=40s

Add an operation action to an existing resource with the following command.

# pcs resource op add <resource_id> <operation_action> [operation_properties]

Delete an existing operation action configured on a resource.

# pcs resource op remove <resource_id> <operation_name> [operation_properties]
Example

To have resource fence the cluster node when a monitor operation fails by setting `on-fail=fence` on a monitor operation for a `Dummy` resource.
# pcs resource create dummy1 ocf:heartbeat:Dummy op monitor interval=30s timeout=10s on-fail=fence

Then to test that the cluster node is fenced when the resource's monitor fails then remove the "state" file on the node that the resource is running on.

 # rm /var/run/resource-agents/Dummy-dummy1.state
  • The default for the stop operation is to fence when STONITH is enabled and block otherwise. All other operations default to restart.
  • Do not set on-fail=stop to a resource stop operation. The cluster won't report an error but this is not a valid setting. As a consequence, the cluster will ignore the on-fail parameter, acting by default and reporting messages like the following in the log files.
Feb 18 16:50:00 nodeb pacemaker-schedulerd[1244]: error: Resetting 'on-fail' for my_resource stop action to default value because 'stop' is not allowed for stop
For more information:
SBR
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.