Placing all ODF components on dedicated infra nodes
Environment
- OpenShift Data Foundation 4.9+
Issue
- How to move
ODF components to dedicatedinfra` (including operator pods) to storage nodes?
Resolution
Note: Updating the node selector below will cause the termination of pods currently running on nodes without the node selector label and starting new pods on the labeled nodes.
Note: If you want to schedule only specific Pods on specific nodes, you need to use both taint/toleration and nodeSelector features. To prevent scheduling non-target Pods on specific nodes, use taint to block non-target Pods. Additionally, use nodeSelector to run the desired Pod on a specific node. This knowledge primarily covers only the configuration methods for nodeSelector. For information on how to set up taint/toleration in ODF, please refer to this article.
Note: Adding storage taint on nodes might require toleration handling for the other daemonset pods such as openshift-dns daemonset. For information about how to manage the tolerations, see the following knowledgebase article.
-
Moving the operator pods (
rook-ceph operator,ocs-operator,odf-operator-controller-manager,noobaa-operator).-
Label the nodes that will host the
ODFcomponents. -
By default, the nodes will already have the
OCSlabelcluster.ocs.openshift.io/openshift-storage: "" -
Add the
infralabel to theODFnodes:$ oc label node <node-name> node-role.kubernetes.io/infra=""
-
-
Edit all subscriptions listed under the
openshift-storagenamespace:$ oc edit subscription <subscription_name> -n openshift-storage -
Add a
nodeSelectorto the subscription in theopenshift-storagenamespace by specifying aconfig.nodeSelectorstanza under thespecsection:config: nodeSelector: cluster.ocs.openshift.io/openshift-storage: "" -
Example edit
ocs-operatorsubscription in theopenshift-storagenamespace:$ oc edit subscription ocs-operator -n openshift-storage -
Add a
nodeSelectorto theocs-operatorsubscription in theopenshift-storagenamespace by specifying aconfig.nodeSelectorstanza under thespecsection:config: nodeSelector: cluster.ocs.openshift.io/openshift-storage: "" -
The final
ocs-operatorsubscriptionresource will look similar to this:spec: channel: <channel-name> config: <======== Added lines nodeSelector: <======== cluster.ocs.openshift.io/openshift-storage: "" <======== installPlanApproval: Automatic name: ocs-operator source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: <starting-CSV> -
Similarly edit all other
openshift-storagesubscriptions. The list can be grabbed by running below command:$ oc get subscription -n openshift-storage-
For
ODF 4.15 and below:odf-operator,mcg-operator,odf-csi-addons-operator. -
For
ODF 4.16:odf-operator,mcg-operator,odf-csi-addons-operator,rook-ceph-operator,ocs-client-operator,odf-prometheus-operator,recipe. -
For
ODF 4.17:odf-operator,mcg-operator,odf-csi-addons-operator,rook-ceph-operator,ocs-client-operator,odf-prometheus-operator,recipe,cephcsi-operator. -
For
ODF 4.18and above:odf-operator,mcg-operator,odf-csi-addons-operator,rook-ceph-operator,ocs-client-operator,odf-prometheus-operator,recipe,cephcsi-operator,odf-dependencies.
-
-
Moving remaining
ODFcomponents.-
By default, the
ODFoperator adds a toleration with the valuenode.ocs.openshift.io/storage=true:NoScheduletoOCSpods. -
Add the following taint
node.ocs.openshift.io/storage=true:NoScheduleto the nodes that were labeled above, so that onlyODFcomponents are scheduled on the node:$ oc adm taint node <node-name> node.ocs.openshift.io/storage=true:NoSchedule
-
-
To place all the
noobaa-*pods and theocs-metrics-exporterpod on the infra nodes, you need the following StorageCluster changes:apiVersion: ocs.openshift.io/v1 kind: StorageCluster ... spec: ... placement: noobaa-standalone: <------ nodeAffinity: <------ requiredDuringSchedulingIgnoredDuringExecution: <------ nodeSelectorTerms: <------ - matchExpressions: <------ - key: node-role.kubernetes.io/infra <------ operator: Exists <------ tolerations: <------ - effect: NoSchedule <------ key: node-role.kubernetes.io/infra <------ operator: Exists <------ metrics-exporter: <------ nodeAffinity: <------ requiredDuringSchedulingIgnoredDuringExecution: <------ nodeSelectorTerms: <------ - matchExpressions: <------ - key: node-role.kubernetes.io/infra <------ operator: Exists <------ tolerations: - effect: NoSchedule key: node-role.kubernetes.io/infra operator: Exists -
You must update to a version that fixes the following issues.
Diagnostic Steps
-
check which nodes the pods in the
openshift-storagenamespace are running on:$ oc get pods -n openshift-storage -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName NAME STATUS NODE csi-addons-controller-manager-86f6cb6dd9-vd9b6 Running workre1.ocp.test.local csi-cephfsplugin-7bkbp Running strg1.ocp.test.local csi-cephfsplugin-bfh85 Running worker2.ocp.test.local csi-cephfsplugin-provisioner-779ddc9875-ccwc2 Running worker2.ocp.test.local csi-cephfsplugin-provisioner-779ddc9875-sk2p8 Running worker3.ocp.test.local csi-cephfsplugin-q4lxh Running strg3.ocp.test.local csi-cephfsplugin-rv6jf Running workre1.ocp.test.local csi-cephfsplugin-w4l9n Running strg2.ocp.test.local csi-cephfsplugin-xzv4x Running worker3.ocp.test.local csi-rbdplugin-7rgzf Running worker3.ocp.test.local csi-rbdplugin-cf6mb Running workre1.ocp.test.local csi-rbdplugin-j8qpx Running strg3.ocp.test.local csi-rbdplugin-l7fsc Running strg1.ocp.test.local csi-rbdplugin-provisioner-576665f8c6-pc4kq Running worker3.ocp.test.local csi-rbdplugin-provisioner-576665f8c6-wrzhg Running worker2.ocp.test.local csi-rbdplugin-r6926 Running strg2.ocp.test.local csi-rbdplugin-s5v7p Running worker2.ocp.test.local noobaa-core-0 Running strg3.ocp.test.local noobaa-db-pg-0 Running strg3.ocp.test.local noobaa-endpoint-7fcbc8d9f5-zgstf Running strg3.ocp.test.local noobaa-operator-bf78f4bf-gnqhm Running strg1.ocp.test.local ocs-metrics-exporter-76bd884668-ncv52 Running strg1.ocp.test.local ocs-operator-785bbbb5df-gjr8t Running strg1.ocp.test.local odf-console-75f67fd587-q9zwg Running strg1.ocp.test.local odf-operator-controller-manager-6658bc86c4-sbtdw Running strg1.ocp.test.local rook-ceph-crashcollector-strg1.ocp.test.local-56fb694f5-gchh2 Running strg1.ocp.test.local rook-ceph-crashcollector-strg2.ocp.test.local-55b746748d-fjf2f Running strg2.ocp.test.local rook-ceph-crashcollector-strg3.ocp.test.local-6ff86f695c-2244f Running strg3.ocp.test.local rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7f47cf8fnbvhl Running strg3.ocp.test.local rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7858c7677ckxj Running strg1.ocp.test.local rook-ceph-mgr-a-66d56588bd-kxm87 Running strg2.ocp.test.local rook-ceph-mon-a-76bf9596d9-6szfm Running strg2.ocp.test.local rook-ceph-mon-b-6b8dffbb6c-4wlb2 Running strg3.ocp.test.local rook-ceph-mon-c-5986f55686-lzg85 Running strg1.ocp.test.local rook-ceph-operator-86b8f7d45d-spmrz Running strg1.ocp.test.local rook-ceph-osd-0-bbdf6c994-g4kzv Running strg3.ocp.test.local rook-ceph-osd-1-6fdb4dfd4d-r25d8 Running strg2.ocp.test.local rook-ceph-osd-2-5cf44cbcb6-hpfrk Running strg1.ocp.test.local rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-77ffb856fb47 Running strg1.ocp.test.local
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.