CSI Migration fails for PVs with missing vmdk suffix
Environment
- Red Hat OpenShift Container Platform
- 4.13
- 4.14
Issue
- During/Post the CSI migration (see: KCS 7011683 for more details), volumes fail to mount.
Warning FailedAttachVolume #m (x# over ##m) attachdetach-controller AttachVolume.Attach failed for volume "PV_NAME" : rpc error: code = Internal desc = failed to attach disk: "[DATASTORE] PATH/PATH" with node: "NODE_HASH" err ServerFaultCode: The object or item referred to could not be found.
Resolution
We can workaround this issue by recreating the persistentvolume(PV) objects with vmdk suffix.
Make sure existing PVs are Available/Released and not bound, since we will be deleting PV objects as part of the process.
- IMPORTANT: We need to make sure that
[ReclaimPolicy](https://docs.openshift.com/container-platform/4.14/storage/understanding-persistent-storage.html#reclaiming_understanding-persistent-storage)of PV is set to Retain, by following these docs:
$ oc patch pv <your-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
- Scale down any pod that is using the PV.
$ oc scale --replicas=0 <resource>
- Delete corresponding PVC object.
- Since
ReclaimPolicyon PV is set to Retain this should not result in deletion of PV objects.
- Since
$ oc delete pvc/<resource>
- Take a backup of PV objects, by simply saving YAML of PV objects.
$ oc get pv <pv> -o yaml > pv_backup.yaml
- Delete the PV objects
$ oc delete pv <pv>
- Re-create PV object but by updating volumePath
- Now we can re-create the PV object that were using same volumePath but this time with vmdk suffix. You can use YAML of PV objects you saved in step#2 as a reference. Be sure that
volumePath: 'PATH.vmdk'has the .vmdk suffix.
- Now we can re-create the PV object that were using same volumePath but this time with vmdk suffix. You can use YAML of PV objects you saved in step#2 as a reference. Be sure that
$ vim pv_backup.yaml ### This is where you make the denoted edits above.
$ oc create -f pv_backup.yaml
Root Cause
Users who typically encounter this issue; have followed these steps to establish PV's in the cluster.
It was discovered while debugging - This content is not included.OCPBUGS-42321 that intree vSphere PVs for which vmdk suffix is missing fail during CSI migration. Usually for in-tree vSphere PVs, vmdk extension is expected in volumePath field:
apiVersion: v1
kind: PersistentVolume
metadata:
name: manual-pv
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 4Gi
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
volumeMode: Filesystem
vsphereVolume:
fsType: ext4
volumePath: '[vsanDatastore] isos/hekumar-local'
If an admin forgot to specify the suffix (as in the example above), at PV creation, users migrating PV's will see CSI failures, not seen with the intree driver.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.