RHEL High Availability Documentation: Testing and Troubleshooting Procedures
NOTE: For a general overview of the documentation available for the Red Hat High Availability Add-On, see the Red Hat High Availability Add-On Documentation Guide.
Troubleshooting your system should begin when you first configure your system, when you create your fencing devices and system resources. It is important that you test your system at each configuration step before proceeding to the next step. For example, if you configure a fence agent and then add many resources without testing along the way, your system may start to see problems that you could have prevented if you'd stopped right after fence configuration and tested your fence devices.
It is particularly important to test your fencing devices during initial configuration, even if you do not see any issues immediately. A situation that may require your system to be fenced might arrive many months down the line, but if you have not fully tested your fence configuration originally you will not see a problem in the configuration until that time.
Red Hat provides the following documentation to help you test your system and its components.
General testing and troubleshooting documentation
- Testing a fence device (RHEL 10)
- Testing a fence device (RHEL 9)
- Testing a fence device (RHEL 8)
- What is the proper way to simulate a network failure on a RHEL Cluster?
- Interpreting resource agent OCF return codes (RHEL 10)
- Interpreting resource agent OCF return codes (RHEL 9)
- Interpreting resource agent OCF return codes (RHEL 8)
- Diagnostic Procedures for RHEL High Availability Clusters - Resources in RHEL 7, 8
- Diagnostic Procedures for RHEL High Availability Clusters - Troubleshooting fencing problems in RHEL 6, 7, or 8
- Diagnostic Procedures for RHEL High Availability Clusters - General Membership and Communication Troubleshooting in RHEL 7, 8
- Pacemaker resource becomes FAILED (blocked)
Using the kdump feature to troubleshoot your cluster
- How to troubleshoot kernel crashes, hangs, or reboots with kdump on Red Hat Enterprise Linux
- How do I configure kdump for use with the RHEL 6, 7, 8 High Availability Add-On?
- How do I configure fence_kdump in a Red Hat Pacemaker cluster?
Troubleshooting specific errors
- Node using iSCSI LUNs has reservation conflicts during VMware virtual machine snapshot
- A node was fenced after "A processor failed, forming new configuration" in a RHEL 6 or 7 High Availability cluster
- High Availability cluster node logs the message "Corosync main process was not scheduled for X ms (threshold is Y ms). Consider token timeout increase
- How to change totem token timeout value in a RHEL 5, 6, 7, or 8 High Availability cluster?
- A stonith resource attempts to fence a cluster node while it is in stopped state on a pacemaker cluster
- RHEL High Availability cluster nodes on IBM z Systems experience STONITH-device timeouts around midnight on a nightly basis
- Pacemaker's "db2" resource fails to start due to database not being in a HADR configuration
- Why does the
LVM-activateresource fails to activate the VG withERROR: vg_name: failed to activate?
Contacting technical support
- This content is not included.Contacting Technical Support
- What information is required for Red Hat Global Support Services to troubleshoot a High Availability or Resilient Storage issue?
- Red Hat Support Severity Level Definitions
- Qualifications for 24x7 Customer Support
- How do I join a remote support session via Bomgar?
- How does Red Hat define standard business hours?
- Record Linux terminal session using GNU script
- How do I log activity in a terminal session to give to Red Hat Support?
- Collecting supplemental system utilization statistics for fence events or performance problems in RHEL High Availability or Resilient Storage clusters