CSI Images on the holder pods in ODF 4.12 with Multus Tech Preview are not updated while upgrading the cluster.

Solution Verified - Updated

Environment

  • ODF 4.12.0 up to and including 4.12.7 with Multus Tech Preview
  • ODF 4.13.0 up to and including 4.13.2 with Multus Tech Preview

Issue

  • ODF 4.12 cluster deployed with Multus Tech Preview feature, fails to update CSI Images while upgrading to ODF 4.12.3 and higher versions.

Resolution

Background

  • While updating OpenShift Data Foundation with Multus networking, the cluster does not update the CSI holder DaemonSet pods.

  • By design, holder pods are intended never to be restarted except during drain or restart of an OpenShift node.

  • Restarting the holder pod on an OpenShift node will result in permanently blocked storage I/O to application pods running on that node until the application is rescheduled to a different OpenShift node.

  • If a holder pod is "Running" with no issues, it is likely working as intended, and it is may not be necessary to force the pods to be upgraded.

  • If any holder pods are in the ImagePullBackoff error state, the update procedure is recommended. This often seen when container images are mirrored to a private image registry.

    • The ImagePullBackoff issue was fixed with the below ODF updates:
      • 4.12, patch 4.12.8 and higher: https://access.redhat.com/errata/RHBA-2023:5377
      • 4.13, patch 4.13.3 and higher: https://access.redhat.com/errata/RHSA-2023:5376
      • All versions 4.14 and higher
  • This procedure may be used to restart holder pods for any version of ODF. For example, this process may be used if the user wishes to reboot all ODF pods (which includes holder pods) on a node. While this is not recommended, it may sometimes be necessary.

Procedure

  • We need to perform the following procedure to update the pods.

  • This procedure requires each OpenShift node to be fully drained. Be mindful of this requirement before deciding to proceed.

  1. List Ceph file system (CephFS), RADOS block device (RBD), and Network File System (NFS) holder pods that are running for holder DaemonSet.
$oc get po -owide -l  'app in (csi-rbdplugin-holder,csi-cephfsplugin-holder,csi-nfsplugin-holder)'
NAME                                                           READY   STATUS    RESTARTS   AGE   IP            NODE                      NOMINATED NODE   READINESS GATES
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-b2hk8   1/1     Running   0          58m   10.131.0.27   argo006.ceph.redhat.com   <none>           <none>
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-bxp8c   1/1     Running   0          58m   10.128.2.80   argo007.ceph.redhat.com   <none>           <none>
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-jmg86   1/1     Running   0          58m   10.129.2.30   argo005.ceph.redhat.com   <none>           <none>
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-flqhm      1/1     Running   0          58m   10.131.0.26   argo006.ceph.redhat.com   <none>           <none>
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-sphkh      1/1     Running   0          58m   10.128.2.79   argo007.ceph.redhat.com   <none>           <none>
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-xttqj      1/1     Running   0          58m   10.129.2.29   argo005.ceph.redhat.com   <none>           <none>
csi-nfsplugin-holder-ocs-storagecluster-cephcluster-b3hk8   1/1     Running   0          58m   10.131.0.27   argo006.ceph.redhat.com   <none>           <none>
csi-nfsplugin-holder-ocs-storagecluster-cephcluster-avp8c   1/1     Running   0          58m   10.128.2.80   argo007.ceph.redhat.com   <none>           <none>
csi-nfsplugin-holder-ocs-storagecluster-cephcluster-gux86   1/1     Running   0          58m   10.129.2.30   argo005.ceph.redhat.com   <none>           <none>
  1. Remount all the applications running on the nodes that are listed in the previous step with the updated holder pod. When you delete the holder pod without moving the applications to another node, it results in a broken mount and all the applications lose access to storage. You need to move the applications to different nodes and then restart the holder pod on the node.

    a. Identify the node from the pod list of the previous step.
    b. Mark the node as unschedulable.

    $ oc adm cordon <node1>
    

    For example:

    $ oc adm cordon argo006.ceph.redhat.com 
      node/argo006.ceph.redhat.com cordoned
    

    c. Drain the node to remove all the running pods.

    $ oc adm drain <node1> --ignore-daemonsets --delete-emptydir-data --force
    

    For example:

    $ oc adm drain argo006.ceph.redhat.com --ignore-daemonsets --delete-emptydir-data --force
      node/argo006.ceph.redhat.com already cordoned
      ...
    

    d. If desired, wait for user applications running on the node to be become Ready on other nodes.
    e. Delete the holder pods on the node that is displayed in the pod list of the step 1.

    $ oc delete po csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-b2hk8 csi-rbdplugin-holder-ocs-    storagecluster-cephcluster-flqhm csi-nfsplugin-holder-ocs-storagecluster-cephcluster-b3hk8 csi-nfsplugin-holder-ocs-storagecluster-cephcluster-b3hk8
    

    f. Mark the node as schedulable.

    $ oc adm uncordon <node1>
    

    For example:

    $ oc adm uncordon argo006.ceph.redhat.com
     node/argo006.ceph.redhat.com uncordoned
    

    g. Repeat the steps for all the nodes displayed in the pod list of step 1.

  2. Verify that the holder pods are updated to use the latest version of the cephcsi image in the holder Daemonset.

$ oc get ds -nopenshift-storage csi-cephfsplugin-holder-my-cluster -o jsonpath="{.spec.template.spec.containers[*].i}"
registry.redhat.io/odf4/cephcsi-rhel9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0

$ oc get ds -nopenshift-storage csi-rbdplugin-holder-my-cluster -o jsonpath="{.spec.template.spec.containers[*].i}"
registry.redhat.io/odf4/cephcsi-rhel9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0

$ oc get ds -nopenshift-storage csi-nfsplugin-holder-my-cluster -o jsonpath="{.spec.template.spec.containers[*].i}"
registry.redhat.io/odf4/cephcsi-rhel9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0

$ oc get po -nopenshift-stroage -l  'app in (csi-rbdplugin-holder,csi-cephfsplugin-holder,csi-nfsplugin-holder)' -o jsonpath="{.items[*].spec.containers[*].image}"
registry.redhat.io/odf4/cephcsi-rhe9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0
registry.redhat.io/odf4/cephcsi-rhel9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0
registry.redhat.io/odf4/cephcsi-rhel9@sha256:c008d6b133807c2c60b2a7ae5a5e5435d1e6ac67f553ec7775e5469bc00315f0

[...]
  • Depending on the number of nodes, multiple images are displayed in the output.

Root Cause

  • The CSI controller of Rook-Ceph Operator creates a csi-plugin-holder DaemonSet configured to use the network.selectors.public network specified for the CephCluster CR. This DaemonSet runs on all nodes alongside the csi-{cephfs,rbd,nfs}plugin DaemonSet. The new holder DaemonSet contains only a single container called holder, which is responsible for pinning the network for filesystem mounts and mapped block devices.

  • The holder DaemonSet is used by the Ceph-CSI plugin pod as a stable network namespace from which to mounting Ceph storage Persistent Volume Claims (PVCs). For more information about the holder pod, see Content from github.com is not included.CSI pods.

  • The above points are intentional architectural designs that allow ODF to achieve long-term storage stability through ODF version upgrades.

  • ODF versions below 4.14.0, below 4.13.3, and below 4.12.8 contained a bug wherein ODF did not update holder pod DaemonSets to refer to the latest ODF image for new holders, resulting in ImagePullBackoff errors for users employing container image registry mirroring.

Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.