CSI Migration fails for PVs with missing vmdk suffix

Solution Unverified - Updated 10 Oct 2024

Environment

Red Hat OpenShift Container Platform
- 4.13
- 4.14

Issue

During/Post the CSI migration (see: KCS 7011683 for more details), volumes fail to mount.

Warning  FailedAttachVolume  #m (x# over ##m)  attachdetach-controller  AttachVolume.Attach failed for volume "PV_NAME" : rpc error: code = Internal desc = failed to attach disk: "[DATASTORE] PATH/PATH" with node: "NODE_HASH" err ServerFaultCode: The object or item referred to could not be found.

Resolution

We can workaround this issue by recreating the persistentvolume(PV) objects with vmdk suffix.
Make sure existing PVs are Available/Released and not bound, since we will be deleting PV objects as part of the process.

IMPORTANT: We need to make sure that [ReclaimPolicy](https://docs.openshift.com/container-platform/4.14/storage/understanding-persistent-storage.html#reclaiming_understanding-persistent-storage) of PV is set to Retain, by following these docs:

$ oc patch pv <your-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

Scale down any pod that is using the PV.

$ oc scale --replicas=0 <resource>

Delete corresponding PVC object.
- Since ReclaimPolicyon PV is set to Retain this should not result in deletion of PV objects.

$ oc delete pvc/<resource>

Take a backup of PV objects, by simply saving YAML of PV objects.

$ oc get pv <pv> -o yaml > pv_backup.yaml

Delete the PV objects

$ oc delete pv <pv>

Re-create PV object but by updating volumePath
- Now we can re-create the PV object that were using same volumePath but this time with vmdk suffix. You can use YAML of PV objects you saved in step#2 as a reference. Be sure that volumePath: 'PATH.vmdk' has the .vmdk suffix.

$ vim pv_backup.yaml   ### This is where you make the denoted edits above. 
$ oc create -f pv_backup.yaml

Root Cause

Users who typically encounter this issue; have followed these steps to establish PV's in the cluster.

It was discovered while debugging - This content is not included.OCPBUGS-42321 that intree vSphere PVs for which vmdk suffix is missing fail during CSI migration. Usually for in-tree vSphere PVs, vmdk extension is expected in volumePath field:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: manual-pv
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 4Gi
  persistentVolumeReclaimPolicy: Retain
  storageClassName: ""
  volumeMode: Filesystem
  vsphereVolume:
    fsType: ext4
    volumePath: '[vsanDatastore] isos/hekumar-local'

If an admin forgot to specify the suffix (as in the example above), at PV creation, users migrating PV's will see CSI failures, not seen with the intree driver.

SBR

Shift Storage

Product(s)

Red Hat OpenShift Container Platform

Components

Storage

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.