"Permission Denied" on ODF CephFS RWX Volumes with Custom SCCs
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4
- Red Hat OpenShift Data Foundation (RHODF) 4
Issue
-
Pods in a multi-replica deployment using a ReadWriteMany (RWX) PersistentVolumeClaim (PVC) on ODF CephFS are unable to access the shared volume concurrently.
-
Typically, only one pod (often the last one to start) can access the volume mount, while all other pods receive Permission denied errors.
Resolution
Workaround
-
Update the
PersistentVolume (PV)object associated with the problematic PVC and add thekernelMountOptions:-
As a first step, back up all the PVC, PVs, and Ceph data affected by this procedure. Although these steps only manipulate OCP API objects in the API server/etcd, it's a best practice to keep this information safe. The available methods for taking this backup are outside this document's scope. For further reference, please review KCS article 5456281 - OpenShift APIs for Data Protection (OADP) FAQ.
-
Create a copy of the PersistentVolume:
$ oc get pv <pv-name> -o yaml > pv-backup.yaml -
Edit the PV and change the field:
$ oc edit pv <pv-name> <extra-output-removed> persistentVolumeReclaimPolicy: Delete <-- Modify from Delete to Retain -
Verify the
persistentVolumeReclaimPolicyis correctly set toRetain. This step prevents OCP from deleting the underlying Ceph volume in case something unexpected occurs:$ oc get pv <pv-name> NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE pvc-1a205b41-6628-4aac-b142-646a49ffd469 2Gi RWO Retain Bound [REDACTED] ocs-storagecluster-cephfs <unset> 1y -
Edit the file
pv-backup.yamltaken and add the followingkernelMountOptionsline in the volumeattributes section to force a generic SELinux context that all pods can access:$ vim pv-backup.yaml <extra-output-removed> volumeAttributes: clusterID: openshift-storage fsName: ocs-storagecluster-cephfilesystem + kernelMountOptions: context="system_u:object_r:container_file_t:s0" storage.kubernetes.io/csiProvisionerIdentity: 1632867397636-8081-openshift-storage.cephfs.csi.ceph.com subvolumeName: csi-vol-9ff22e37-27c8-11ec-9f29-0a580a81020a volumeHandle: 0001-0011-openshift-storage-0000000000000001-9ff22e37-27c8-11ec-9f29-0a580a81020a -
Delete the existing PV:
$ oc delete pv <pv-name> -
This deletion process might get stuck in the Terminating status. If so, edit the PV once more and remove the finalizers section:
$ oc edit pv <pv-name> <extra-output-removed> finalizers: - kubernetes.io/pv-protection <---remove this line in the pv yaml. -
At this point, the PVC bound to the deleted PV will result in Lost status, i.e., status.phase: Lost. This is expected and doesn't impact the running IO to ODF/OCS: any pod already mounting the affected PV will keep the usual read and write access. Any new pods trying to mount this PV will be in ContainerCreating status until the procedure is finished. After that, they'll run normally.
-
Recreate the PV with the modifications done:
$ oc create -f pv-backup.yaml -
Remove annotation
pv.kubernetes.io/bind-completedfrom the PVC (not the PV!) This tells OCP to rebind the PV and PVC and resolve the lost phase.$ oc patch -n $PVCNAMESPACE pvc $PVCNAME --type json '-p=[{"op": "remove", "path": "/metadata/annotations/pv.kubernetes.io/bind-completed"}]'
-
Root Cause
-
The
ODF CephFS CSIdriver is working as expected. -
When a pod mounts the volume, the entire shared directory is relabeled with the unique
Multi-Category Security (MCS)label of that specific pod's container process (e.g., s0:c25,c26). When another pod with a different MCS label (e.g., s0:c27,c28) attempted to mount the same volume, the volume was relabeled with another set of uniqueMulti-Category Security (MCS)labels. The last pod to successfully mount the volume "won," effectively locking out all others. This behavior violates theReadWriteManycontract, which requires simultaneous access for multiple pods. -
The workaround forces a generic, shared SELinux context (
container_file_t:s0) on the mount, which does not contain specific MCS categories, making it accessible to any container process.
Diagnostic Steps
- Confirm the pods are using a custom SCC with
seLinuxContext: type: RunAsAny:
$ oc get pods web-app-749c7476b-4b7bb -o yaml | grep scc
openshift.io/scc: custom-scc
$ oc get scc custom-scc
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP PRIORITY READONLYROOTFS VOLUMES
custom-scc true ["*"] RunAsAny RunAsAny RunAsAny RunAsAny 5 false ["*"]
- Take
rshinto multiple pods from the deployment and try to list the contents of the shared mount path. Note that only one podsucceedswhile others fail withPermission denied:
# This pod will fail
$ oc rsh web-app-749c7476b-4b7bb
sh-4.4$ ls -lZ /mnt/shared/testfile
ls: cannot access '/mnt/shared/testfile': Permission denied
# This pod will succeed
$ oc rsh web-app-749c7476b-vlv6p
sh-4.4$ ls -lZ /mnt/shared/testfile
-rw-rw-r--. 1 1001 1001 system_u:object_r:container_file_t:s0:c230,c706 0 Jun 9 15:47 /mnt/shared/testfile
-
Inspect SELinux Labels on a Node:
-
Debug into the worker node hosting one of the failing pods and find the process ID of the container running in the pod. For this, get the
containerIDfirst and then get thepidof the container process:# Failing pod $ oc get pods web-app-749c7476b-4b7bb -o json | jq -r '.status.containerStatuses[].containerID'| tr -d 'crio://-' | cut -c1-10 13f8286f26 $ oc debug node/<node_name> s h-5.1# chroot /host sh-5.1# crictl inspect --output go-template --template '{{.info.pid}}' 13f8286f26 774704 -
Check the SELinux label of the container process:
# ps -efZ | grep 774704 system_u:system_r:container_t:s0**:c91,c480** 1001 774704 18066 0 09:50 ? 00:00:00 httpd -D FOREGROUND= -
Find the physical mount path for the PV on the node:
# Failing pod $ oc get pods web-app-749c7476b-4b7bb -o jsonpath='{.metadata.uid}' f39de208-4178-438a-956c-b6665bc4aaf9 >> (Referred as pod-UID later) $ oc debug node/<node_name> sh-5.1# chroot /host sh-5.1# df -h | grep f39de208-4178-438a-956c-b6665bc4aaf9 | grep mount <IP>/volumes/csi/csi-vol-<id> 1.0G 0 1.0G 0% /var/lib/kubelet/pods/<pod-UID>/volumes/kubernetes.io~csi/<pv-name>/mount -
Check the SELinux label of the volume mount directory.
# ls -lZd /var/lib/kubelet/pods/<pod-UID>/volumes/kubernetes.io~csi/<pv-name>/mount drwxrwsr-x. 2 root 1001 system_u:object_r:container_file_t:s0:**c393,c462** 1 Jun 9 15:47 /var/lib/kubelet/pods/<pod-UID>/volumes/kubernetes.io~csi/<pv-name>/mount/
-
-
Confirm the mismatch: The MCS label on the directory (e.g., s0:c393,c462) is different from the failing container's process label (s0:c91,c480).
-
Repeat the above step on the node with the working pod and see that the MCS label of the container process exactly matches the MCS label on the volume mount directory.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.