RHOCP 4 Operator pods blocking the unmount of CSI volumes

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP) 4

Issue

  • Mounts not removed due to OpenShift processes still accessing mounpoint.
  • CSI volumes cannot be mounted with errors mpatha: map in use and mpatha: can't flush.
  • Multipath is not able to flush unmouted volumes because kept busy by ovnkube.

Resolution

This issue has been reported to Red Hat engineering. It is being tracked in the following bugs:

Node Tuning Operator

RHOCP ReleaseBugFixed versionErrata
4.14This content is not included.OCPBUGS-149464.14.0RHSA-2023:5006
4.13This content is not included.OCPBUGS-157384.13.5RHSA-2023:4091

OVNKubernetes and Multus

RHOCP ReleaseBugFixed versionErrata
4.17This content is not included.OCPBUGS-309504.17.1RHSA-2024:7922
4.16This content is not included.OCPBUGS-365944.16.19RHSA-2024:8415
4.15This content is not included.OCPBUGS-427534.15.40RHSA-2024:10839
4.14This content is not included.OCPBUGS-427544.14.45RHSA-2025:0364

Workaround for ovnkube-node and multus

Modify the daemonsets to add the HostToContainer propagation:

$  oc -n openshift-multus patch daemonset multus --type='json' -p='[
  {
    "op": "replace",
    "path": "/spec/template/spec/containers/0/volumeMounts/10",
    "value": {
      "name": "host-var-lib-kubelet",
      "mountPath": "/var/lib/kubelet",
      "mountPropagation": "HostToContainer",
   }
 }
]'
$  oc -n openshift-ovn-kubernetes patch daemonset ovnkube-node --type='json' -p='[
  {
    "op": "replace",
    "path": "/spec/template/spec/containers/7/volumeMounts/1",
    "value": {
      "name": "host-kubelet",
      "mountPath": "/var/lib/kubelet",
      "mountPropagation": "HostToContainer",
      "readOnly": true
   }
 }
]'

Workaround for tuned

  • this resolution applies for the tuned pod created by the node-tuning ClusterOperator, and should only be used after suggestion from a Red Hat Support Associate. Please note that this is a temporary solution as an upgrade of the operator will remove the mitigation, as long as this operator is not put into unmanaged mode

  • verify the current mount propagation for the host volumeMount for the tuned pods

$ oc get pods -n openshift-cluster-node-tuning-operator  -l openshift-app=tuned -o json | jq  '.items[0].spec.containers[0].volumeMounts[] | select(.name=="host")'
  • this should return no special configured mountPropagation, which means the default will be used. If the command above does not return this exact output, stop here and contact Red Hat support.
{
  "mountPath": "/host",
  "name": "host"
}
  • patch the tuned daemon set, so that the pod template of the daemonset uses the HostToContainer mount propagation.
$ oc patch daemonset tuned -n openshift-cluster-node-tuning-operator --type=json -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/volumeMounts/-1", "value": {"name": "host", "mountPath": "/host", "mountPropagation": "HostToContainer"}}]'
  • verify that the pods have now the HostToContainer mount propagation.
$ oc get ds -n openshift-cluster-node-tuning-operator tuned -o json | jq '.spec.template.spec.containers[0].volumeMounts[-1]'
{

  "mountPath": "/host",
  "mountPropagation": "HostToContainer",
  "name": "host"
}

Now from that point on, the tuned pods should never prevent the unmounting of a CSI volume.

Root Cause

When a container mounts the directory /var/lib/kubelet without the mountPropagation: HostToContainer is not aware if there is any PersistentVolume mounted by other pods.

Diagnostic Steps

Use the solution Is it possible to flush a multipath map when multipath -f reports that the map is in use? to identify that ovnkube, multus-daemon or openshift-tuned are keeping the volume busy.

Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.