What does log message "entering GATHER state" mean in Red Hat High Availability Add-on?
Environment
- Red Hat Enterprise Linux Server(RHEL) 5 with High Availability or Resilient Storage Add-on
- Red Hat Enterprise Linux Server(RHEL) 6 with High Availability or Resilient Storage Add-on
Issue
- In the event of a cluster membership change, the cluster enters into a
GATHERstate. The logs will report messages similar to the following:
Dec 7 06:30:08 hostX openais[5555]: [TOTEM] entering GATHER state from 9.
Dec 7 06:30:10 hostX openais[5555]: [TOTEM] entering GATHER state from 0.
- What does this messages mean in Red Hat High Availability Cluster?
Resolution
When nodes in a cluster enter the GATHER state, they send join messages out to rest of the cluster in order to form a consensus about the cluster membership. These messages can be interpreted as follows:
0: Consensus timeout expired
The consensus timer expired. This timer is set on entry to GATHER state and is reset when COMMMIT state is entered.
It means the nodes took too long to agree on the membership list.
2: Token timeout in OPERATIONAL (normal) state
3: Token timeout in GATHER state
4: Token timeout in COMMIT state
5: Token timeout in RECOVERY state
NOTE: These states are all related. The Token timer is set when the token is transmitted and if it expires
before another message is received it will trigger one of these messages, depending on the state of
the protocol at the time.
6: Token failed to receive (ARU count > fail_to_recv_const)
We failed to receive a copy of our own token.
This will always be accompanied by a "FAILED TO RECEIVE" message.
7: mcast (data) message received from unknown node while in OPERATIONAL state
8: mcast (data) message received from unknown node while in GATHER state
Self-explanatory I think. This can be caused by a brief network split where
a node is forced to leave the cluster but doesn't get fenced before the network
heals again.
9: Merge detection message received while OPERATIONAL
When nodes are missing from the membership and there are no naturally-occurring multicast messages
being sent, the messaging layer will send a periodic merge-detection message to see if any other
partitions are operating without being part of this configuration. This usually just means there
are nodes missing, but doesn't otherwise signify a problem.
10: Merge detected in GATHER
As above but while the cluster was already in transition from another node joining or leaving.
11: JOIN received while OPERATIONAL
12: JOIN received while in GATHER
13: JOIN received while in COMMIT
14: JOIN received while in RECOVERY
A JOIN message is sent by a node if GATHER times out, to bring
a new node into the cluster. These logs indicate
receipt of one of these messages in GATHER or COMMIT state.
15: Interface changed state
Often seen at startup, but can happen if an interface is taken down unexpectedly
Root Cause
-
The
GATHERstate message is normally caused by a network/communication issue within the cluster. ButGATHERstates can be entered for a number of reasons. The number at the end of the message (from X) indicates why it entered theGATHERstate. This is called by "message_handler_memb_merge_detect" when the cluster is attempting to see if there are other nodes are out on the network. -
GATHERstate happens every time a node receives its own token back (meaning its the only node in the ring). During this time, it starts a timer to form and agree on a membership list of nodes in the cluster. If this timer expires, we enter theGATHERstate to see if there is another node out there, and attempt to merge with it. After a certain number of times after the node receives its our own token back, it will stop sending it. In which case, these state changes will also stop. Therefore, they are a side effect of the earlier communication problem and subsequent fencing that left this node alone in the cluster.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.