Critical DaemonSets Missing Universal Toleration

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (OCP) 4.1 - 4.5

Issue

When a node is configured with Taints, it prevents several critical cluster daemonsets from being scheduled on that node(s), causing the cluster to become unstable.

The DaemonSets affected are:

  1. The machine-config-daemon ds, in the openshift-machine-config-operator project
  2. The node-ca ds, in the openshift-image-registry project
  3. The dns-default ds, in the openshift-dns project*

This issue is currently being tracked in This content is not included.bugzilla.

If OpenShift Container Storage (OCS) is deployed, it includes the following DaemonSets which are also unable to schedule on nodes that are tainted or on master nodes. (with the exception of storage nodes containing the node.ocs.openshift.io/storage=true:NoSchedule taint):

  1. The csi-cephfsplugin ds, in the openshift-storage project
  2. The csi-rbdplugin ds, in the openshift-storage project

  • While the dns-default ds is affected by this issue, testing has determined that it does not critically impact cluster functionality and stability.

Resolution

To allow the daemonsets to be scheduled on all nodes a Universal Toleration needs to be applied. A Universal Toleration allows the resource to be scheduled on all nodes regardless of taints they may contain.
Universal Tolerations should not be used lightly as it will result in the specified resource being scheduled on all nodes - which include the master nodes.

The following patches should be applied to allow for the required daemonsets to be scheduled on all nodes. In the case of the 2 storage ds, they will now be scheduled on the masters as well.

machine-config-daemon

oc patch ds machine-config-daemon -n openshift-machine-config-operator  --type=merge -p '{"spec": {"template": { "spec": {"tolerations":[{"operator":"Exists"}]}}}}'

node-ca

oc patch ds node-ca -n openshift-image-registry --type=merge -p '{"spec": {"template": { "spec": {"tolerations":[{"operator":"Exists"}]}}}}'

csi-cephfsplugin

oc patch ds csi-cephfsplugin -n openshift-storage  --type=merge -p '{"spec": {"template": { "spec": {"tolerations":[{"operator":"Exists"}]}}}}'

csi-rbdplugin

oc patch ds csi-rbdplugin -n openshift-storage  --type=merge -p '{"spec": {"template": { "spec": {"tolerations":[{"operator":"Exists"}]}}}}'

dns-default

Currently, there is no workaround to apply a universal toleration to the dns-default daemonset as scheduling is determined by the dns operator. Therefore, applying it would require modifications to the operator which would result in an unsupported configuration. But, as it does not affect cluster functionality a workaround is not critical at this time.

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.