How to add toleration for the "non-ocs" taints to the OpenShift Data Foundation pods?
There may be situations where the worker nodes already have some non ODF taint and admin wishes running OpenShift Data Foundation(ODF) (previously known as OpenShift Container Storage) pods as well on those tainted nodes. In such a situation, we need to add toleration for that taint to the ODF pods. The steps below will guide you to add the toleration to the ODF pods.
NOTE:
-
On a new ODF installation, it is important to install ODF without taints on the OCS nodes, then add tolerations (for infra taints that will be set at the end) as described below, check that deployments and pods were reconciled with the tolerations, and finally add the taints to the nodes. If the taints are added before the tolerations, openshift may fail to reconcile the toleration changes into the deployments and pods.
-
If you want to schedule only specific Pods on specific nodes, you need to use both taint/toleration and nodeSelector features. To prevent shcduling non-target Pods on specific nodes, use taint to block non-target Pods. Additionally, use nodeSelector to run the desired Pod on a specific node. This knowledge covers only the configuration methods for taint/toleration. For information on how to set up nodeSelector in ODF, please refer to this knowledge article.
To remove a taint, example where key=xyz , value=true and Effect=NoSchedule :
# oc adm taint nodes node_name xyz=true:NoSchedule-
To add a taint, example:
# oc adm taint nodes node_name xyz=true:NoSchedule
For ODF 4.21 and above
Follow the section - For ODF 4.16 and above & For ODF 4.19 and above , the additional changes needed for ODF 4.21 or above are:
a. When you edit the StorageCluster CR in step 3, add below toleration as well along with the rest mentioned in that section, this toleration is for blackbox-exporter pod:
blackbox-exporter:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
For ODF 4.19 and above
Follow the section - For ODF 4.16 and above, the additional changes needed for ODF 4.19 or above are:
a. When you edit the StorageCluster CR in step 3, add below toleration as well along with the rest mentioned in that section, this toleration is for ocs-provider-server pod:
api-server:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
b. For adding tolerations to csi pods, instead of passing the csi specific tolerations in StorageCluster CR, below steps needs to be followed:
$ oc get driver
NAME AGE
openshift-storage.cephfs.csi.ceph.com 6d23h
openshift-storage.nfs.csi.ceph.com 6d23h
openshift-storage.rbd.csi.ceph.com 6d23h
# oc edit driver <driver_name> -n openshift-storage
Syntax:
spec
controllerPlugin:
tolerations:
<your toleration>
nodePlugin:
tolerations:
<your toleration>
For Example:
spec
controllerPlugin:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
nodePlugin:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
Note: There is known issue addressed in This content is not included.DFBUGS-3797 in ODF 4.19 where the Noobaa DB pods are not currently honouring the custom tolerations, this is now fixed in ODF 4.19.5, to leverage the fix, install or upgrade the cluster above 4.19.5 for applying custom tolerations on Noobaa pods.
For ODF 4.18 and above
Follow the section - For ODF 4.16 and above, the additional changes needed for ODF 4.18 or above are:
a. When you edit the StorageCluster CR in step 3, add below toleration as well along with the rest mentioned in that section, this toleration is for ocs-provider-server pod:
api-server:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
For ODF 4.16 and above
- Get the list of subscriptions:
# oc get subs -n openshift-storage
Example Output
# oc get sub
NAME PACKAGE SOURCE CHANNEL
mcg-operator-stable-4.16-redhat-operators-openshift-marketplace mcg-operator redhat-operators stable-4.16
ocs-client-operator-stable-4.16-redhat-operators-openshift-marketplace ocs-client-operator redhat-operators stable-4.16
ocs-operator-stable-4.16-redhat-operators-openshift-marketplace ocs-operator redhat-operators stable-4.16
odf-csi-addons-operator-stable-4.16-redhat-operators-openshift-marketplace odf-csi-addons-operator redhat-operators stable-4.16
odf-operator odf-operator redhat-operators stable-4.16
odf-prometheus-operator-stable-4.16-redhat-operators-openshift-marketplace odf-prometheus-operator redhat-operators stable-4.16
recipe-stable-4.16-redhat-operators-openshift-marketplace recipe redhat-operators stable-4.16
rook-ceph-operator-stable-4.16-redhat-operators-openshift-marketplace rook-ceph-operator redhat-operators stable-4.16
- Edit the
odf-operatorsubscriptions to add the desired toleration to theodf-console,odf-operator-controller-manager,ocs-operator,rook-ceph-operator,csi-addons-controller-manager,ux-backend-serverandnoobaa-operatorpods:
# oc edit sub odf-operator -n openshift-storage
Syntax:
spec:
config:
tolerations:
<your toleration>
For Example:
spec:
config:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
- Edit the
StorageCluster CRto add the desired toleration for theRook Ceph,Noobaa,csi-plugin,csi-provisioner,metrics-exporterandtoolboxpods:
# oc edit storagecluster ocs-storagecluster -n openshift-storage
For Example:
spec:
placement:
all:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
csi-plugin:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
csi-provisioner:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
mds:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
metrics-exporter:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
noobaa-core:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
rgw:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
toolbox:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
For ODF 4.15
- Get the list of subscriptions:
# oc get subs -n openshift-storage
Example Output
# oc get sub
NAME PACKAGE SOURCE CHANNEL
mcg-operator-stable-4.15-redhat-operators-openshift-marketplace mcg-operator redhat-operators stable-4.15
ocs-operator-stable-4.15-redhat-operators-openshift-marketplace ocs-operator redhat-operators stable-4.15
odf-csi-addons-operator-stable-4.15-redhat-operators-openshift-marketplace odf-csi-addons-operator redhat-operators stable-4.15
odf-operator odf-operator redhat-operators stable-4.15
- Edit all the listed subscriptions to add the desired toleration to the
odf-console,odf-operator-controller-manager,ocs-metrics-exporter,ocs-operator,rook-ceph-operator,csi-addons-controller-manager,ux-backend-serverandnoobaa-operatorpods:
# oc edit sub <subscription_name> -n openshift-storage
Syntax:
spec:
config:
tolerations:
<your toleration>
For Example:
spec:
config:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
- Edit the configmap of
rook-ceph-operatorto add toleration to thecsi-plugin&csi-provisionerpods:
# oc edit configmap rook-ceph-operator-config -n openshift-storage
data:
CSI_PLUGIN_TOLERATIONS: |2-
<toleration>
CSI_PROVISIONER_TOLERATIONS: |2-
<toleration>
For Example:
data:
CSI_PLUGIN_TOLERATIONS: |2-
- key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
effect: NoSchedule
- key: xyz
operator: Equal
value: "true"
effect: NoSchedule
CSI_PROVISIONER_TOLERATIONS: |2-
- key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
effect: NoSchedule
- key: xyz
operator: Equal
value: "true"
effect: NoSchedule
Note: Do not remove the node.ocs.openshift.io/storage toleration for CSI pods while adding any new toleration.
- Edit the
StorageCluster CRto add the desired toleration for theRook Ceph,Noobaaandtoolboxpods:
# oc edit storagecluster ocs-storagecluster -n openshift-storage
For Example:
spec:
placement:
toolbox:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
all:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
mds:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
noobaa-core:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
rgw:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
For ODF 4.14 and below
- Get the list of subscriptions:
# oc get subs -n openshift-storage
Example Output
# oc get sub
NAME PACKAGE SOURCE CHANNEL
mcg-operator-stable-4.14-redhat-operators-openshift-marketplace mcg-operator redhat-operators stable-4.14
ocs-operator-stable-4.14-redhat-operators-openshift-marketplace ocs-operator redhat-operators stable-4.14
odf-csi-addons-operator-stable-4.14-redhat-operators-openshift-marketplace odf-csi-addons-operator redhat-operators stable-4.14
odf-operator odf-operator redhat-operators stable-4.14
- Edit all the listed subscriptions to add the desired toleration to the
odf-console,odf-operator-controller-manager,ocs-metrics-exporter,ocs-operator,rook-ceph-operator,ux-backend-serverandnoobaa-operatorpods:
# oc edit sub <subscription_name> -n openshift-storage
Syntax:
spec:
config:
tolerations:
<your toleration>
For Example:
spec:
config:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
- Edit the configmap of
rook-ceph-operatorto add toleration to thecsi-plugin&csi-provisionerpods
# oc edit configmap rook-ceph-operator-config -n openshift-storage
data:
CSI_PLUGIN_TOLERATIONS: |2-
<toleration>
CSI_PROVISIONER_TOLERATIONS: |2-
<toleration>
For Example:
data:
CSI_PLUGIN_TOLERATIONS: |2-
- key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
effect: NoSchedule
- key: xyz
operator: Equal
value: "true"
effect: NoSchedule
CSI_PROVISIONER_TOLERATIONS: |2-
- key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
effect: NoSchedule
- key: xyz
operator: Equal
value: "true"
effect: NoSchedule
Note: Do not remove the node.ocs.openshift.io/storage toleration for CSI pods while adding any new toleration.
- Edit the
StorageCluster CRto add the desired toleration for theRook CephandNoobaapods:
# oc edit storagecluster ocs-storagecluster -n openshift-storage
For Example:
spec:
placement:
all:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
mds:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
noobaa-core:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
rgw:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
- Edit the
initializationfile to add the desired toleration for thetoolboxpod:
# oc edit ocsinitializations.ocs.openshift.io -n openshift-storage
spec:
tolerations:
<your toleration>
For example:
spec:
tolerations:
- effect: NoSchedule
key: xyz
operator: Equal
value: "true"
For 'holder' plugin pods for ODF-4.13 and above
-
We have additional 'holder' plugin pods:
csi-rbdplugin-holder-ocs-storagecluster-cephcluster and csi-cephfsplugin-holder-ocs-storagecluster-cephcluster -
The steps to add tolerations to the holder plugins are different.
-
As per the design, the holder pod is not meant to be updated when something is changed in the Rook, as it holds the mounts. So editing the configmap alone will not work here. It is actually a matter of ODF working as intended and keeping user pods from encountering blocked I/O to PVCs. We are working to get this documented, refer This content is not included.BZ2218015
-
You have to add the required tolerations in configmap, and then to change or update the daemonset template, follow the steps below:
# oc edit configmap rook-ceph-operator-config -n openshift-storage
data:
CSI_PLUGIN_TOLERATIONS: |2-
- key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
effect: NoSchedule
- key: xyz
operator: Equal
value: "true"
effect: NoSchedule
CSI_PROVISIONER_TOLERATIONS: |2-
- key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
effect: NoSchedule
- key: xyz
operator: Equal
value: "true"
effect: NoSchedule
a. List all the nodes where the holder pods are running
b. Delete the holder daemonset
For example: #oc delete ds csi-rbdplugin-holder-my-cluster --cascade=orphan
c. Restart the Rook operator pod
d. If you have any existing PVC on the nodes, we have to remount the existing volumes. Bringing back the holder pod to a running state doesn't remount the volumes.
e. Reboot the nodes in which the holder pods got recreated. This will force the PVs to be reattached.
f. Check you can access all the PVC/ application pods are in running state.