How do you use fence_manual and fence_ack_manual?

Updated

Overview

This article describes the fence agent fence_manual and the command line tool fence_ack_manual and provides information about when these tools are appropriate and supportable to use. For more information, see the tech brief at fencing-red-hat-enterprise-linux-methods-use-cases-and-failover.

Environment

  • Red Hat Cluster Suite 4+
  • Red Hat Enterprise Linux Server 5 (with the High Availability)
  • Red Hat Enterprise Linux Server 6 (with the High Availability)

Manual fencing

The fence agent called fence_manual. This fence agent is a dummy agent as it takes no external action to perform I/O fencing. Its only function is to wait for someone to manually override it via the fence_ack_manual command line tool.

Usage of the fence agent fence_manual implies that every time a node fails in your cluster, it will always require manual intervention to continue cluster operations, including the ability to relocate a failed service to a functional node. This is the reason that fence_manual is unsupported since it does not provide for high availability unless you have a system administrator monitoring your cluster 24 hours a day. In addition, it can cause corruption of cluster resources if used incorrectly.

The original purpose for fence_manual was to act solely as a secondary fence agent. For example, you would configure your primary fence to be fence_ipmilan and your backup fence agent to be fence_manual. If multiple attempts to issue a fence operation via the primary fence agent(fence_ipmilan) failed, the secondary fence would fire. In this case the secondary fence would wait to be overridden manually via fence_ack_manual.

This usage of fence_manual was suboptimal since it required the primary agent to fail multiple times before it could be overridden, and fence_manual has often been misused as a primary fence agent. This lead to fence_manual being unsupported.

fence_manual is not shipped at all in RHEL 6 HA, only fence_ack_manual is shipped.

Fencing override

To eliminate the issues seen with fence_manual, the fence_ack_manual tool was enhanced so that it would work with all standard fence agents. This means that any fence agent will respond to an override via this command. This removes the need to configure fence_manual as a secondary or backup fence device, but still provides the administrator the ability to continue cluster operation by manually verifying that the failed node is truly down.

Supportability

  • Usage of fence_manual is not supported in any production cluster. You may use this fence agent for development or debugging purposes only.
  • Usage of fence_ack_manual is fully supported in conjunction with any fence agent that we ship.

Before issuing a fence_ack_manual override, the system administrator must first verify that the node being overridden has been cut off from I/O to shared storage or filesystems. This can be accomplished in several ways:

  • Pull the power on the node.
  • Pull network and/or fibre channel cables as appropriate.
  • Disable FC or Ethernet switch ports.
  • Verify that the cluster node has become completely disabled and has no chance of returning to operation.
Category
Components