How do I configure the consensus timeout in a Red Hat High Availability cluster?
Environment
- Red Hat Enterprise Linux 5 (with the High Availability Add-on)
- Red Hat Enterprise Linux 6 (with the High Availability Add-on)
- Red Hat Enterprise Linux 7 (with the High Availability Add-on)
- Red Hat Enterprise Linux 8 (with the High Availability Add-on)
- Red Hat Enterprise Linux 9 (with the High Availability Add-on)
Issue
- My cluster is taking longer than the totem token timeout to recognize and recover from a node failure.
- Why does it takes longer than the token timeout to recognize node failure after an
"A processor failed, forming new configuration"message? - How can I change the consensus timeout values?
Resolution
When the cluster detects a membership change (e.g., a processor failure/node leave event, or a node join event), it waits the consensus timeout for a consensus to form before starting a new round of membership configuration.
In the case of node failure, the cluster detects the failure after the totem token timeout expires. Then it waits for the consensus timeout to expire before forming a new membership. If that membership still does not include the failed node, then under normal circumstances the node will be fenced.
Updating the consensus timeout
RHEL 5 and 6
Edit the <totem> element of the /etc/cluster/cluster.conf file on one node, setting the consensus attribute to the desired value in milliseconds. For example, to set consensus to a value of 3000, you would make a change like the following.
# BEFORE
<cluster name="mycluster" config_version="10"/>
...
<totem/>
...
</cluster>
# AFTER
<cluster name="mycluster" config_version="10"/>
...
<totem consensus="3000"/>
...
</cluster>
Then propagate the change to all nodes. Finally, restart the cluster on all nodes. For more details, refer to the following solutions.
- How can I propagate changes I've made to /etc/cluster/cluster.conf to all the nodes in my cluster?
- What order do I need to start/stop the cluster services in?
RHEL 7 or later
Edit the /etc/corosync/corosync.conf on one node. For example, to set consensus to a value of 3000, you would make a change like the following. , sync cluster configuration and restart cluster on all nodes.
# BEFORE
totem {
...
}
# AFTER
totem {
...
consensus: 3000
...
}
Then sync the updated configuration to the rest of the nodes by running the following on the node where you made the change.
# pcs cluster sync
Finally, restart the cluster on all nodes. (Note: It may be possible to simply reload the configuration by running pcs cluster reload corosync instead of restarting the cluster. However, it is not clear whether that option was available in the earliest RHEL 7 releases.)
# pcs cluster stop --all
# pcs cluster start --all
RHEL 8 or 9
A new pcs command was added with errata RHEA-2021:1737 with the following package: pcs-0.10.8-1.el8 or later (See: This content is not included.BZ#1667061). This will modify the consensus timeout and can be done while corosync is running on all nodes.
# pcs cluster config update totem consensus=3000
Related Articles
Root Cause
RHEL 5 and 6
In RHEL 5.4, the cluster consensus timeout value is set to 4800 ms by default, making total failover time equal to token + 4800 ms. The default token timeout is 10 seconds. So this results in a total timeout of 14.8 seconds before the cluster can resume operations after a token is lost.
In RHEL 5.5, the consensus value was changed to token * 2, making it greater than the token timeout. This was in order to account for clusters larger than two nodes, which required a higher value.
This has an adverse impact on the total failover time for clusters, as it would now take token * 3 ms instead of the previous token + 4800 ms. The impact is greater on clusters where the token timeout is set to a higher value.
As of cman-2.0.115-34.el5 or later on RHEL 5 and throughout RHEL 6, the default consensus value is calculated based on the token timeout and the number of nodes in the cluster. In most environments, this automatically calculated default does not need to be changed.
However, the default consensus value in a two-node cluster differs from the default value in a cluster of three or more nodes. Therefore, if you later add nodes to a two-node cluster, then you should restart the cluster so that the default consensus value is updated to a reasonable value for larger clusters. If you do not want to restart the cluster, then you should likely configure an updated consensus value manually.
When consensus is calculated automatically, the following rules are used:
- If there are two or fewer nodes, consensus is calculated as
token * 0.2ms, with a floor of 200 ms and a ceiling of 2000 ms. - If there are three or more nodes, consensus is calculated as
token + 2000ms.
If you configure consensus manually for a cluster with 3 or more nodes, a value greater than totem's token should be used.
RHEL 7 or later
The corosync.conf(5) man page reports the following:
consensus
This timeout specifies in milliseconds how long to wait for consensus to be achieved before starting a new round of membership configuration. The minimum value for consensus must be 1.2 * token. This value will be automatically calculated at 1.2 * token if the user doesn't specify a consensus value.
For two node clusters, a consensus larger than the join timeout but less than token is safe. For three node or larger clusters, consensus should be larger than token. There is an increasing risk of odd membership changes, which still guarantee virtual synchrony, as node count grows if consensus is less than token.
Diagnostic Steps
See Issue section.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.