CSI Images on the holder pods in ODF 4.12 with Multus Tech Preview are not updated while upgrading the cluster.
Environment
- ODF 4.12.0 up to and including 4.12.7 with Multus Tech Preview
- ODF 4.13.0 up to and including 4.13.2 with Multus Tech Preview
Issue
- ODF 4.12 cluster deployed with Multus Tech Preview feature, fails to update CSI Images while upgrading to ODF 4.12.3 and higher versions.
Resolution
Background
-
While updating OpenShift Data Foundation with Multus networking, the cluster does not update the CSI holder DaemonSet pods.
-
By design, holder pods are intended never to be restarted except during drain or restart of an OpenShift node.
-
Restarting the holder pod on an OpenShift node will result in permanently blocked storage I/O to application pods running on that node until the application is rescheduled to a different OpenShift node.
-
If a holder pod is "Running" with no issues, it is likely working as intended, and it is may not be necessary to force the pods to be upgraded.
-
If any holder pods are in the ImagePullBackoff error state, the update procedure is recommended. This often seen when container images are mirrored to a private image registry.
- The ImagePullBackoff issue was fixed with the below ODF updates:
- 4.12, patch 4.12.8 and higher: https://access.redhat.com/errata/RHBA-2023:5377
- 4.13, patch 4.13.3 and higher: https://access.redhat.com/errata/RHSA-2023:5376
- All versions 4.14 and higher
- The ImagePullBackoff issue was fixed with the below ODF updates:
-
This procedure may be used to restart holder pods for any version of ODF. For example, this process may be used if the user wishes to reboot all ODF pods (which includes holder pods) on a node. While this is not recommended, it may sometimes be necessary.
Procedure
-
We need to perform the following procedure to update the pods.
-
This procedure requires each OpenShift node to be fully drained. Be mindful of this requirement before deciding to proceed.
- List Ceph file system (CephFS), RADOS block device (RBD), and Network File System (NFS) holder pods that are running for holder DaemonSet.
$oc get po -owide -l 'app in (csi-rbdplugin-holder,csi-cephfsplugin-holder,csi-nfsplugin-holder)'
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-b2hk8 1/1 Running 0 58m 10.131.0.27 argo006.ceph.redhat.com <none> <none>
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-bxp8c 1/1 Running 0 58m 10.128.2.80 argo007.ceph.redhat.com <none> <none>
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-jmg86 1/1 Running 0 58m 10.129.2.30 argo005.ceph.redhat.com <none> <none>
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-flqhm 1/1 Running 0 58m 10.131.0.26 argo006.ceph.redhat.com <none> <none>
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-sphkh 1/1 Running 0 58m 10.128.2.79 argo007.ceph.redhat.com <none> <none>
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-xttqj 1/1 Running 0 58m 10.129.2.29 argo005.ceph.redhat.com <none> <none>
csi-nfsplugin-holder-ocs-storagecluster-cephcluster-b3hk8 1/1 Running 0 58m 10.131.0.27 argo006.ceph.redhat.com <none> <none>
csi-nfsplugin-holder-ocs-storagecluster-cephcluster-avp8c 1/1 Running 0 58m 10.128.2.80 argo007.ceph.redhat.com <none> <none>
csi-nfsplugin-holder-ocs-storagecluster-cephcluster-gux86 1/1 Running 0 58m 10.129.2.30 argo005.ceph.redhat.com <none> <none>
-
Remount all the applications running on the nodes that are listed in the previous step with the updated holder pod. When you delete the holder pod without moving the applications to another node, it results in a broken mount and all the applications lose access to storage. You need to move the applications to different nodes and then restart the holder pod on the node.
a. Identify the node from the pod list of the previous step.
b. Mark the node as unschedulable.$ oc adm cordon <node1>For example:
$ oc adm cordon argo006.ceph.redhat.com node/argo006.ceph.redhat.com cordonedc. Drain the node to remove all the running pods.
$ oc adm drain <node1> --ignore-daemonsets --delete-emptydir-data --forceFor example:
$ oc adm drain argo006.ceph.redhat.com --ignore-daemonsets --delete-emptydir-data --force node/argo006.ceph.redhat.com already cordoned ...d. If desired, wait for user applications running on the node to be become Ready on other nodes.
e. Delete the holder pods on the node that is displayed in the pod list of the step 1.$ oc delete po csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-b2hk8 csi-rbdplugin-holder-ocs- storagecluster-cephcluster-flqhm csi-nfsplugin-holder-ocs-storagecluster-cephcluster-b3hk8 csi-nfsplugin-holder-ocs-storagecluster-cephcluster-b3hk8f. Mark the node as schedulable.
$ oc adm uncordon <node1>For example:
$ oc adm uncordon argo006.ceph.redhat.com node/argo006.ceph.redhat.com uncordonedg. Repeat the steps for all the nodes displayed in the pod list of step 1.
-
Verify that the holder pods are updated to use the latest version of the
cephcsiimage in the holder Daemonset.
$ oc get ds -nopenshift-storage csi-cephfsplugin-holder-my-cluster -o jsonpath="{.spec.template.spec.containers[*].i}"
registry.redhat.io/odf4/cephcsi-rhel9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0
$ oc get ds -nopenshift-storage csi-rbdplugin-holder-my-cluster -o jsonpath="{.spec.template.spec.containers[*].i}"
registry.redhat.io/odf4/cephcsi-rhel9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0
$ oc get ds -nopenshift-storage csi-nfsplugin-holder-my-cluster -o jsonpath="{.spec.template.spec.containers[*].i}"
registry.redhat.io/odf4/cephcsi-rhel9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0
$ oc get po -nopenshift-stroage -l 'app in (csi-rbdplugin-holder,csi-cephfsplugin-holder,csi-nfsplugin-holder)' -o jsonpath="{.items[*].spec.containers[*].image}"
registry.redhat.io/odf4/cephcsi-rhe9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0
registry.redhat.io/odf4/cephcsi-rhel9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0
registry.redhat.io/odf4/cephcsi-rhel9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0
[...]
- Depending on the number of nodes, multiple images are displayed in the output.
Root Cause
-
The CSI controller of Rook-Ceph Operator creates a
csi-plugin-holderDaemonSet configured to use thenetwork.selectors.publicnetwork specified for theCephClusterCR. This DaemonSet runs on all nodes alongside thecsi-{cephfs,rbd,nfs}pluginDaemonSet. The new holder DaemonSet contains only a single container called holder, which is responsible for pinning the network for filesystem mounts and mapped block devices. -
The holder DaemonSet is used by the Ceph-CSI plugin pod as a stable network namespace from which to mounting Ceph storage Persistent Volume Claims (PVCs). For more information about the holder pod, see Content from github.com is not included.CSI pods.
-
The above points are intentional architectural designs that allow ODF to achieve long-term storage stability through ODF version upgrades.
-
ODF versions below 4.14.0, below 4.13.3, and below 4.12.8 contained a bug wherein ODF did not update holder pod DaemonSets to refer to the latest ODF image for new holders, resulting in ImagePullBackoff errors for users employing container image registry mirroring.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.