Design Guidance for RHEL High Availability Clusters - Selecting the Transport Protocol

Updated 17 Jun 2021

Overview

Applicable Environments

Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add-On

Useful References and Guides

Explore: corosync

Introduction

This guide provides recommendations and considerations for choosing the transport protocol that RHEL High Availability cluster members use to communicate. This decision can influence the requirements of the network they use, and can affect functionality and performance in the cluster in different ways.

Ideal Scenarios for Each Transport

udp over Multicast

Multicast is supported by the network
Minimal concerns about network-hardware processing load that could result from multicast traffic
Scenarios involving high-messaging-volume use cases
Using RHEL 6 prior to Update 2, where udpu is not available

udp over Broadcast

Multicast not supported by the network, or otherwise not ideal
Minimal concerns about network-hardware processing load that could result from broadcast traffic
Only cluster members use this same subnet
udpu is not desireable due to high-messaging-volume
Using RHEL 6 prior to Update 2, where udpu not available

udpu

Multicast or broadcast not available, or otherwise not ideal
Concerns exist around processing load on the switches, which could be exacerbated by cluster messaging traffic
No high-messaging-volume use cases are deployed in this cluster, or thorough testing has been done and found added overhead to be acceptable
Other hosts that are not members of this cluster share the same network

Factors to Consider When Choosing Transport Protocol

Does My Network Support Multicast?

multicast is often not supported or enabled on networks for one of various reasons, most commonly including:

Organizational policy requires it be disabled for security reasons or to minimize load
No other use cases deployed on the network require it, so it was never set up fully
Configuration may be complex, especially when dealing with multiple interconnected switches
Cloud platform's network does not support multicast traffic - such as Amazon's VPC or Microsoft Azure networks.

When multicast is not available, the options are udpu or "udp over broadcast".

Does My Network Support Broadcast?

Many organizations also disable broadcast, mainly because it allows an individual host to spam traffic to all other hosts on the same subnet, possibly causing them to expend unnecessary processing cycles to handle. Also because broadcast packets may trigger higher processing load per-packet than other direct host-to-host packets, which might lead organizations to disable broadcast to prevent switches being overloaded.

Broadcast messaging is not supported on IPv6 networks.

When broadcast is not available, the options are udpu or "udp over multicast".

Does My Network Support udpu?

There are no special requirements for the network when using udpu, other than needing an open path from each node to each other node over the cluster-communication port: 5405/udp by default.

Does My RHEL HA Installation Support My Preferred Transport?

See the overview of transport protocols for details on supported releases for each. In a nutshell:

RHEL 7, RHEL 6 Update 2 or later: udpu, udp over broadcast or multicast
RHEL 6 Prior to Update 2: udp over broadcast or multicast

Where Should Processing Load Be Distributed? Switches vs Hosts

Whether the load on the switch is higher from udpu or one of the udp options depends on the specifics of the environment - udpu doesn't activate the same higher-computational-overhead that multicast or broadcast might, but you're also sending a higher volume of packets from node to node. In other words, when consider load on the switches, udpu vs udp does not have an obvious and clear winner. For further consideration:
- "udp over Multicast" requires the network hardware to perform extra work to maintain group membership and deliver messages to subscribed hosts, which can become problematic if the switch in use for the cluster's network are near or at processing capacity. If the switches are dedicated to just this cluster, this often does not matter. If they serve many hosts, then how much load the multicast traffic from the cluster will add - and whether shifting that work to the hosts themselves with udpu would be better - can be considered.
- "udp over Broadcast" may similarly trigger higher processing workloads on switches, as they have to distribute single messages out to many different hosts. It can also create higher processing volume on other hosts outside the cluster but on the same subnet, as they will receive these messages and have to at least do some initial processing to drop them.
udpu can put higher computational load on the nodes themselves, as they have to craft packets to go to multiple nodes in the cluster for each message, whereas udp over multicast or broadcast would only require them sending that message and letting the network distribute it. This higher processing overhead can strain a heavily-loaded cluster node if there is a high volume of messages. Even in a two-node cluster, more local processing is required for udpu than with udp. So, if there will potentially be a high volume of messages and there is concern about minimizing processing resources on a node, udp may be a better fit than udpu.

Will My Cluster's Use Case Send a High Volume of Messages?

High-messaging-volume use cases include:

Use of GFS2 with applications that utilize POSIX locking
clvmd - LVM volumes in volume groups marked as "clustered"
- NOTE: This only generates messages during LVM operations, so if they are rare, volume of messaging may not be a concern
cmirror - LVM mirrored volumes in volume groups marked as "clustered"
pacemaker clusters with many resources or frequent, ongoing resource or membership activity

Will My Cluster's Node Count Have an Effect on Messaging?

The size of a cluster can impact the volume of messages transmitted throughout its members. More nodes can mean more join sequences which each would create a small burst of messages, and - probably more significantly - there may be more nodes with applications sending messages to each other. With any of the protocols, a higher volume of messages will likely mean higher computational load somewhere - whether that be on the network hardware or the nodes - so in larger-sized clusters it is especially important to consider where that load should be distributed and what volume of messages should be expected, as discussed in other sections.

With a larger cluster, udpu is going to have much more work to do on individual nodes for each message that must be sent than it would have in a smaller cluster. If there is an expectation that these nodes may be very heavily utilized in the area of CPU or network resources, then it could be beneficial to offload that computation to the network through one of the udp protocols. If the network will be heavily utilized and the nodes are expected to have plenty of free cycles, then udpu might provide more benefit.

The main point is: it is important to be aware that larger clusters can both influence the volume of messages and how much work must take place to process each message, so that the cluster can be planned with adequate resources and the work can be distributed to the best location.

Will The Same Network Be Used By Other Hosts That Are Not Members of This Cluster?

If multiple distinct clusters will be supported by a single logical network, then messaging within each cluster can interfere with the operation of the nodes in the other cluster, and the choice of transport can influence the level of impact.

When using "udp over broadcast", messages are sent to the broadcast address, which in turn delivers them to all hosts on that subnet. If the other hosts receiving that traffic are not cluster nodes, then they almost certainly aren't listening on the correct port and thus would just discard the packets; however that may still create extra load on that system as it initially processes the packet leading up to dropping it. If the other systems sharing the network are cluster nodes, then they may be listening on that port, and thus may go through full cluster-processing of the message before determining that it is not intended for their cluster. Configuring these separate clusters to use different communication ports can avoid that problem, but there will still be the problem of those hosts receiving traffic not intended for them.

With "udp over multicast", the multicast facility ensures that messages should only be delivered to the hosts that have subscribed to a particular address, so that at least prevents non-cluster hosts from receiving the traffic. However, if another cluster shares the same network and uses the same cluster name or is configured with the same multicast address, then those two clusters will both be subscribed to that address and will receive each others messages. Just as with broadcast, the nodes listening on the same port will receive messages for another cluster and will have to process them before determining they're not valid for this cluster, causing unnecessary computational load. If these clusters must share the network, then ensuring they use different multicast addresses can prevent them from interfering with each other.

In summary, when considering the network layout for clusters, the preferred design is to dedicate a VLAN or subnet for each cluster and not share it with other hosts, and then it does not need to influence the choice of transport protocol. In the case where the network must be shared, "udp over multicast" or udpu is a better choice than "udp over broadcast".

Can I Monitor Processing Volume that Results from Messaging?

Capturing traffic with tcpdump or similar over the cluster interconnect can highlight the messaging traffic flowing through the cluster. If using multicast, filter by the cluster's multicast address, or similarly filter by broadcast address if using broadcast. With udpu, it is not as straightforward to filter out messages in a capture, as messages would appear similar to tokens without decrypting the stream. But comparing the overall packet volume during different workloads - peak activity vs nodes have only joined but aren't doing anything - may still hint at whether messaging volume is high enough at certain times to warrant consideration.

One can also monitor the CPU usage for the corosync process in RHEL 6 and 7 clusters. With higher messaging volumes, these may use more CPU, and that can be compared across different workloads or when using udpu vs udp.

How Can I Test the Impact that High Messaging Volume Would Have In My Environment?

The best method would be to simply exercise the use case that will be served by this cluster. If its hosting an application, run that with as much data or work to complete as possible. If its serving NFS, set up as many clients to exercise it.

If a generic use case is needed to flood high volumes of messaging traffic, then one of a few strategies could help:

Set up as many clustered volume groups as possible with clvmd running, then run LVM commands (like vgs, vgchange -a, etc) frequently across the nodes.
Set up a GFS2 file system and issue frequent/continuous POSIX lock calls through fcntl().
Set up a cluster-mirrored volume with cmirror and clvmd and push high volumes of I/O through it across multiple nodes.

Product(s)

Red Hat Enterprise Linux

Category

Learn more

Components

corosync

Article Type

General