Is it supported to scale up control plane / etcd node replicas in OpenShift 4?

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Bare metal
  • Control plane nodes
  • etcd

Issue

  • Is it possible to scale master / etcd nodes to more than 3 in OpenShift 4?
  • How to configure more than 3 master nodes in OCP 4

Resolution

It is important to know that scaling etcd does not improve performance. When scaling to 5 members the quorum has to pass traffic to all the members and wait for consensus from more members before committing a change. This introduces latency to etcd, which in turn increases latency for the API server and cluster operators.

Note: for easier management of control plane nodes in IPI installed clusters, like replacing failing one, it is possible to use the control plane machine sets (CPMS) described in is it possible to define machinesets for control plane nodes in OpenShift 4.

Support for more than 3 control plane nodes

Starting with OpenShift 4.17, for clusters installed on a bare metal platform, it is possible to scale a cluster to 4 or 5 nodes as a post-installation task. The etcd Operator scales accordingly to account for the additional node. For more information, refer to node scaling for etcd.

Starting with OpenShift 4.18, it is possible to configure 4 and 5 node control planes with the Agent-based Installer. Refer to the optional configuration parameters for the supported number of replicas for the controlPlane parameter.

Note: as explained in node scaling for etcd, and only applicable to the above mentioned versions, scaling a cluster to 4 or 5 control plane nodes is available only on bare metal platforms. In addition to that, it is also explained that while adding control plane nodes can increase reliability and availability, it can decrease throughput and increase latency, affecting performance.

In OpenShift 4.16 and older releases, the above is not supported and exactly three control plane nodes (so, etcd nodes) must be used for all production deployments.

Refer to the control plane architecture in the official documentation for additional information about the control plane.

Root Cause

With the redesign of OpenShift 4 and performance improvements of components that make up OpenShift 4, it is no longer necessary to run with more than (>) 3 control plane nodes.

Scaling etcd does not improve performance. Also, adding more master nodes increases complexity of the system, increases maintenance costs and does not increase the redundancy by a big margin. Additionally, having more etcd nodes has a larger impact on the networking system, since one needs to distribute the etcd database across more nodes. etcd works with a quorum consensus algorithm called RAFT.

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.