Is it possible to configure a RHEL cluster such that it will function with only 1 node up?

Solution Verified - Updated

Environment

  • Red Hat Cluster Suite 4+
  • Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
    • cman for cluster membership and quorum
    • NOTE: See this solution for more information on achieving similar goals in RHEL 7 with pacemaker and votequorum
  • 3 or more nodes in the cluster

Issue

  • I want to configure cluster so that only 1 node needs to be up for cluster to function.
  • Can a last man standing configuration be used in a RHEL High Availability cluster?
  • How can I configure qdiskd to provide votes such that one node can function by itself?

Resolution

Yes, a cluster can be configured to allow for a "last man standing" configuration by using a quorum device.

Before implementing this, please consider the following:
There is almost always a better design that does not require "last man standing" functionality when considering cluster design options. In addition, it is important to remember that quorum is not a restriction, but a tool that the cluster uses to preserve data and service integrity. Working around quorum makes a deployment more failure-prone rather than more robust. If you are considering deploying a "last man standing" cluster please consider contacting Red Hat Global Support Services before putting it into production so Red Hat can review the design and potentially suggest alternate strategies or ways to mitigate problems.

It is also especially important to determine whether a single node can effectively handle the load of the entire cluster by itself. If the cluster is configured to continue functioning with just a single member online, and all work that would normally be handled throughout the cluster is balanced to a single node, then that node may be at risk for becoming overwhelmed and potentially being unable to service any of its processes effectively. Such a "last-man-standing" configuration should only be employed if there are adequate resources available on a single node to handle all work that will occur in such a situation.

Implementation
To implement a "last man standing" configuration, configure the quorum device such that its number of votes is equal to the number of node votes, minus 1. That means that if there are 3 nodes with 1 vote each, then the quorum device itself would offer 2. This would make expected_votes="5", and quorum would be 3. Any one node that has access to the quorum device would have its own vote plus 2 from the quorum device, and thus can maintain quorum by itself.

It is important that a heuristic (which is required in clusters of 3 nodes or more) be configured to detect network splits on the cluster interconnect. Otherwise, there exists a possibility for there to be a split brain situation, or for there to be a fence loop.

Root Cause

The cluster's quorum algorithm prevents a cluster from operating if less than a certain amount of nodes are up and operational. When considering cluster designs, people often wonder if the cluster would be more fault tolerant if the cluster was unhampered by quorum (allowing the cluster to sustain a loss of the majority of nodes). The idea that being able to sustain a majority node loss and remain operational makes the cluster more fault tolerant is mistaken; in fact the exact opposite is true.

The "last man standing" approach to clustering is subject to additional failure cases that cluster's quorum algorithm is designed to prevent.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.