How to add toleration for the "non-ocs" taints to the OpenShift Data Foundation pods?

Updated

There may be situations where the worker nodes already have some non ODF taint and admin wishes running OpenShift Data Foundation(ODF) (previously known as OpenShift Container Storage) pods as well on those tainted nodes. In such a situation, we need to add toleration for that taint to the ODF pods. The steps below will guide you to add the toleration to the ODF pods.

NOTE:

  • On a new ODF installation, it is important to install ODF without taints on the OCS nodes, then add tolerations (for infra taints that will be set at the end) as described below, check that deployments and pods were reconciled with the tolerations, and finally add the taints to the nodes. If the taints are added before the tolerations, openshift may fail to reconcile the toleration changes into the deployments and pods.

  • If you want to schedule only specific Pods on specific nodes, you need to use both taint/toleration and nodeSelector features. To prevent shcduling non-target Pods on specific nodes, use taint to block non-target Pods. Additionally, use nodeSelector to run the desired Pod on a specific node. This knowledge covers only the configuration methods for taint/toleration. For information on how to set up nodeSelector in ODF, please refer to this knowledge article.

To remove a taint, example where key=xyz , value=true and Effect=NoSchedule :

# oc adm taint nodes node_name xyz=true:NoSchedule-

To add a taint, example:

# oc adm taint nodes node_name xyz=true:NoSchedule

For ODF 4.21 and above

Follow the section - For ODF 4.16 and above & For ODF 4.19 and above , the additional changes needed for ODF 4.21 or above are:

a. When you edit the StorageCluster CR in step 3, add below toleration as well along with the rest mentioned in that section, this toleration is for blackbox-exporter pod:

    blackbox-exporter:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"

For ODF 4.19 and above

Follow the section - For ODF 4.16 and above, the additional changes needed for ODF 4.19 or above are:

a. When you edit the StorageCluster CR in step 3, add below toleration as well along with the rest mentioned in that section, this toleration is for ocs-provider-server pod:

    api-server:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"

b. For adding tolerations to csi pods, instead of passing the csi specific tolerations in StorageCluster CR, below steps needs to be followed:

$ oc get driver
NAME                                    AGE
openshift-storage.cephfs.csi.ceph.com   6d23h
openshift-storage.nfs.csi.ceph.com      6d23h
openshift-storage.rbd.csi.ceph.com      6d23h
# oc edit driver <driver_name> -n openshift-storage

Syntax:

spec
  controllerPlugin:
    tolerations:
        <your toleration>
  nodePlugin:
    tolerations:
        <your toleration>

For Example:

spec
  controllerPlugin:
    tolerations:
    - effect: NoSchedule
      key: xyz
      operator: Equal
      value: "true"
    - effect: NoSchedule
      key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
  nodePlugin:
    tolerations:
    - effect: NoSchedule
      key: xyz
      operator: Equal
      value: "true"
    - effect: NoSchedule
      key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"

Note: There is known issue addressed in This content is not included.DFBUGS-3797 in ODF 4.19 where the Noobaa DB pods are not currently honouring the custom tolerations, this is now fixed in ODF 4.19.5, to leverage the fix, install or upgrade the cluster above 4.19.5 for applying custom tolerations on Noobaa pods.

For ODF 4.18 and above

Follow the section - For ODF 4.16 and above, the additional changes needed for ODF 4.18 or above are:

a. When you edit the StorageCluster CR in step 3, add below toleration as well along with the rest mentioned in that section, this toleration is for ocs-provider-server pod:

    api-server:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"

For ODF 4.16 and above

  1. Get the list of subscriptions:
# oc get subs -n openshift-storage

Example Output

# oc get sub
NAME                                                                         PACKAGE                   SOURCE             CHANNEL
mcg-operator-stable-4.16-redhat-operators-openshift-marketplace              mcg-operator              redhat-operators   stable-4.16
ocs-client-operator-stable-4.16-redhat-operators-openshift-marketplace       ocs-client-operator       redhat-operators   stable-4.16
ocs-operator-stable-4.16-redhat-operators-openshift-marketplace              ocs-operator              redhat-operators   stable-4.16
odf-csi-addons-operator-stable-4.16-redhat-operators-openshift-marketplace   odf-csi-addons-operator   redhat-operators   stable-4.16
odf-operator                                                                 odf-operator              redhat-operators   stable-4.16
odf-prometheus-operator-stable-4.16-redhat-operators-openshift-marketplace   odf-prometheus-operator   redhat-operators   stable-4.16
recipe-stable-4.16-redhat-operators-openshift-marketplace                    recipe                    redhat-operators   stable-4.16
rook-ceph-operator-stable-4.16-redhat-operators-openshift-marketplace        rook-ceph-operator        redhat-operators   stable-4.16
  1. Edit the odf-operator subscriptions to add the desired toleration to the odf-console, odf-operator-controller-manager, ocs-operator, rook-ceph-operator, csi-addons-controller-manager, ux-backend-server and noobaa-operator pods:
# oc edit sub odf-operator -n openshift-storage

Syntax:

spec:
  config:
    tolerations:
    <your toleration>

For Example:

spec:
  config:
    tolerations:
    - effect: NoSchedule
      key: xyz
      operator: Equal
      value: "true"
    - effect: NoSchedule
      key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
  1. Edit the StorageCluster CR to add the desired toleration for the Rook Ceph, Noobaa, csi-plugin, csi-provisioner, metrics-exporter and toolbox pods:
# oc edit storagecluster ocs-storagecluster -n openshift-storage

For Example:

spec:
  placement:
    all:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    csi-plugin:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    csi-provisioner:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    mds:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    metrics-exporter:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    noobaa-core:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    rgw:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    toolbox:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"

For ODF 4.15

  1. Get the list of subscriptions:
# oc get subs -n openshift-storage

Example Output

# oc get sub
NAME                                                                         PACKAGE                   SOURCE             CHANNEL
mcg-operator-stable-4.15-redhat-operators-openshift-marketplace              mcg-operator              redhat-operators   stable-4.15
ocs-operator-stable-4.15-redhat-operators-openshift-marketplace              ocs-operator              redhat-operators   stable-4.15
odf-csi-addons-operator-stable-4.15-redhat-operators-openshift-marketplace   odf-csi-addons-operator   redhat-operators   stable-4.15
odf-operator                                                                 odf-operator              redhat-operators   stable-4.15
  1. Edit all the listed subscriptions to add the desired toleration to the odf-console, odf-operator-controller-manager, ocs-metrics-exporter, ocs-operator, rook-ceph-operator, csi-addons-controller-manager, ux-backend-server and noobaa-operator pods:
# oc edit sub <subscription_name> -n openshift-storage

Syntax:

spec:
  config:
    tolerations:
    <your toleration>

For Example:

spec:
  config:
    tolerations:
    - effect: NoSchedule
      key: xyz
      operator: Equal
      value: "true"
    - effect: NoSchedule
      key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
  1. Edit the configmap of rook-ceph-operator to add toleration to the csi-plugin & csi-provisioner pods:
# oc edit configmap rook-ceph-operator-config -n openshift-storage
data:
  CSI_PLUGIN_TOLERATIONS: |2-
     <toleration>
   CSI_PROVISIONER_TOLERATIONS: |2-
     <toleration>

For Example:

data:
  CSI_PLUGIN_TOLERATIONS: |2-

    - key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
      effect: NoSchedule
    - key: xyz
      operator: Equal
      value: "true"
      effect: NoSchedule
  CSI_PROVISIONER_TOLERATIONS: |2-
    - key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
      effect: NoSchedule
    - key: xyz
      operator: Equal
      value: "true"
      effect: NoSchedule

Note: Do not remove the node.ocs.openshift.io/storage toleration for CSI pods while adding any new toleration.

  1. Edit the StorageCluster CR to add the desired toleration for the Rook Ceph, Noobaa and toolbox pods:
# oc edit storagecluster ocs-storagecluster -n openshift-storage

For Example:

spec:
  placement:
    toolbox:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    all:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    mds:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    noobaa-core:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    rgw:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"

For ODF 4.14 and below

  1. Get the list of subscriptions:
# oc get subs -n openshift-storage

Example Output

# oc get sub
NAME                                                                         PACKAGE                   SOURCE             CHANNEL
mcg-operator-stable-4.14-redhat-operators-openshift-marketplace              mcg-operator              redhat-operators   stable-4.14
ocs-operator-stable-4.14-redhat-operators-openshift-marketplace              ocs-operator              redhat-operators   stable-4.14
odf-csi-addons-operator-stable-4.14-redhat-operators-openshift-marketplace   odf-csi-addons-operator   redhat-operators   stable-4.14
odf-operator                                                                 odf-operator              redhat-operators   stable-4.14
  1. Edit all the listed subscriptions to add the desired toleration to the odf-console, odf-operator-controller-manager, ocs-metrics-exporter, ocs-operator, rook-ceph-operator, ux-backend-server and noobaa-operator pods:
# oc edit sub <subscription_name> -n openshift-storage

Syntax:

spec:
  config:
    tolerations:
    <your toleration>

For Example:

spec:
  config:
    tolerations:
    - effect: NoSchedule
      key: xyz
      operator: Equal
      value: "true"
    - effect: NoSchedule
      key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
  1. Edit the configmap of rook-ceph-operator to add toleration to the csi-plugin & csi-provisioner pods
# oc edit configmap rook-ceph-operator-config -n openshift-storage
data:
  CSI_PLUGIN_TOLERATIONS: |2-
     <toleration>
   CSI_PROVISIONER_TOLERATIONS: |2-
     <toleration>

For Example:

data:
  CSI_PLUGIN_TOLERATIONS: |2-

    - key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
      effect: NoSchedule
    - key: xyz
      operator: Equal
      value: "true"
      effect: NoSchedule
  CSI_PROVISIONER_TOLERATIONS: |2-
    - key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
      effect: NoSchedule
    - key: xyz
      operator: Equal
      value: "true"
      effect: NoSchedule

Note: Do not remove the node.ocs.openshift.io/storage toleration for CSI pods while adding any new toleration.

  1. Edit the StorageCluster CR to add the desired toleration for the Rook Ceph and Noobaa pods:
# oc edit storagecluster ocs-storagecluster -n openshift-storage

For Example:

spec:
  placement:
    all:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    mds:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    noobaa-core:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
    rgw:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
  1. Edit the initialization file to add the desired toleration for the toolbox pod:
# oc edit ocsinitializations.ocs.openshift.io -n openshift-storage
spec:
  tolerations:
   <your toleration>

For example:

spec:
  tolerations:
  - effect: NoSchedule
    key: xyz
    operator: Equal
    value: "true"

For 'holder' plugin pods for ODF-4.13 and above

  • We have additional 'holder' plugin pods:
    csi-rbdplugin-holder-ocs-storagecluster-cephcluster and csi-cephfsplugin-holder-ocs-storagecluster-cephcluster

  • The steps to add tolerations to the holder plugins are different.

  • As per the design, the holder pod is not meant to be updated when something is changed in the Rook, as it holds the mounts. So editing the configmap alone will not work here. It is actually a matter of ODF working as intended and keeping user pods from encountering blocked I/O to PVCs. We are working to get this documented, refer This content is not included.BZ2218015

  • You have to add the required tolerations in configmap, and then to change or update the daemonset template, follow the steps below:

# oc edit configmap rook-ceph-operator-config -n openshift-storage
data:
  CSI_PLUGIN_TOLERATIONS: |2-

    - key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
      effect: NoSchedule
    - key: xyz
      operator: Equal
      value: "true"
      effect: NoSchedule
  CSI_PROVISIONER_TOLERATIONS: |2-
    - key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
      effect: NoSchedule
    - key: xyz
      operator: Equal
      value: "true"
      effect: NoSchedule

a. List all the nodes where the holder pods are running
b. Delete the holder daemonset
For example: #oc delete ds csi-rbdplugin-holder-my-cluster --cascade=orphan
c. Restart the Rook operator pod
d. If you have any existing PVC on the nodes, we have to remount the existing volumes. Bringing back the holder pod to a running state doesn't remount the volumes.
e. Reboot the nodes in which the holder pods got recreated. This will force the PVs to be reattached.
f. Check you can access all the PVC/ application pods are in running state.

SBR
Category
Article Type