Pacemaker resource becomes FAILED (blocked)
Environment
- Red Hat Enterprise Linux (RHEL) 6
- Red Hat Enterprise Linux (RHEL) 7
- Red Hat Enterprise Linux (RHEL) 8
- Red Hat Enterprise Linux (RHEL) 9
- High-Availability or Resilient Storage Add-on
- Pacemaker Cluster
Issue
-
Running
pcs statusreveals aFAILED (blocked)resource:Full list of resources: VIP (ocf::heartbeat:IPaddr): Started node1 DB (lsb::startdb): FAILED node1 (blocked)
Resolution
In order to resolve the issue that lead to the FAILED state, further troubleshooting and diagnostics would be required to determine the source of the error within the resource agent. Reference the "Diagnostic Steps" section for a possible troubleshooting process and steps to identify the source of the issue.
The FAILED ( blocked ) state may additionally indicate other errors existing in the cluster's stonith or on-fail configurations. Reference the "Root Cause" section for further information on possible contributors to this issue.
After resolving any resource operation errors and/or resolving configuration issues, the cleanup command can be used to clear any errors and test starting the resource again within the cluster:
$ pcs resource cleanup
A support case can be opened with Red Hat if further assistance is needed.
Root Cause
Pacemaker will track the status of every resource operation against the cluster node it is running on. To confirm that a resource is successfully running on any node, Pacemaker will capture the return code when starting, stopping or performing other operations through its OCF or LSB script.
Pacemaker will mark the resource as FAILED ( blocked ) for any of the following conditions:
- If the resource script returns a non-zero value during any operation that has a on-fail value set to
fenceand fencing is not working or disabled.- The default
on-failaction for stop operation is to fence, so a failure to stop without working fencing will result in this status. - Note: A working stonith device with cluster fencing enabled is a requirement for pacemaker support and issues affecting stonith will need to be corrected for full support of Pacemaker clusters.
- The default
- The
on-failaction is set to block for a resource operation and a failure is observed. - In some cases if the resource is unable to run on any node the
FAILEDstatus will be reported. - Other fail conditions may apply.
Diagnostic Steps
In order to further troubleshoot issues affecting resources on startup and/or stop the following process can be used:
- Disable the resource and restart the resource prior to taking any further troubleshooting steps:
- Attempting to start a resource that is in this blocked state following a fencable issue can lead to unexpected behaviors or even data corruption so it is important to reboot a node before running
debug-startordebug-stopsteps.
# Stop the resource from automatically starting in the cluster on boot:
$ pcs resource disable <resource>
# ( If there are pending fences ) Reboot the node to recover from possible unclean state:
$ reboot now
- To start and/or stop the resource with additional debug output the below commands can be ran:
- Note: These commands will not start the resource under pacemaker's management. It is recommended to only perform these actions while the resource is in a "disabled" state to avoid conflicts:
# Run from the node you wish to start the resource on to collect more info on start operations:
# Optionally include `--full` option for full traces.
$ pcs resource debug-start <resource-name> --full
# Run from the node you wish to stop the resource on to collect more info on stop operations:
# Optionally include `--full` option for full traces.
$ pcs resource debug-stop <resource-name> --full
- A return code of 0 means the "start" or "stop" operations were successful. Non-zero return codes will require further troubleshooting specific to the cause of the error.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.