What are the `on-fail` options that can be configured for cluster resources

Solution Verified - Updated 18 Feb 2025

Environment

Red Hat Enterprise Linux Server 6, 7, 8, 9 (with the High Availability Add On)

Issue

How to set the pacemaker resource on-fail options?
What's the meaning these options?
What are the on-fail options that can be configured for cluster resources

Resolution

There are many values can be configured for on-fail option. These values are as below:

ignore: Pretend the resource did not fail
block: Do not perform any further operations on the resource
stop: Stop the resource and do not start it elsewhere
restart: Stop the resource and start it again (possibly on a different node)
fence: STONITH the node on which the resource failed
standby: Move all resources away from the node on which the resource failed

Modify, Add, or Remove resource operation actions

Update an existing resource operation action.

# pcs resource update  <resource_id> op <operation_action> interval=0s timeout=40s

Add an operation action to an existing resource with the following command.

# pcs resource op add <resource_id> <operation_action> [operation_properties]

Delete an existing operation action configured on a resource.

# pcs resource op remove <resource_id> <operation_name> [operation_properties]

Example

To have resource fence the cluster node when a monitor operation fails by setting `on-fail=fence` on a monitor operation for a `Dummy` resource.

# pcs resource create dummy1 ocf:heartbeat:Dummy op monitor interval=30s timeout=10s on-fail=fence

Then to test that the cluster node is fenced when the resource's monitor fails then remove the "state" file on the node that the resource is running on.

 # rm /var/run/resource-agents/Dummy-dummy1.state

The default for the stop operation is to fence when STONITH is enabled and block otherwise. All other operations default to restart.
Do not set on-fail=stop to a resource stop operation. The cluster won't report an error but this is not a valid setting. As a consequence, the cluster will ignore the on-fail parameter, acting by default and reporting messages like the following in the log files.

Feb 18 16:50:00 nodeb pacemaker-schedulerd[1244]: error: Resetting 'on-fail' for my_resource stop action to default value because 'stop' is not allowed for stop

For more information:

SBR

Clusterha

Product(s)

Red Hat Enterprise Linux

Components

cluster

Category

Learn more

Tags

pacemaker

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.