"Permission Denied" on ODF CephFS RWX Volumes with Custom SCCs

Solution Verified - Updated 2 Jul 2025

Environment

Red Hat OpenShift Container Platform (RHOCP) 4
Red Hat OpenShift Data Foundation (RHODF) 4

Issue

Pods in a multi-replica deployment using a ReadWriteMany (RWX) PersistentVolumeClaim (PVC) on ODF CephFS are unable to access the shared volume concurrently.
Typically, only one pod (often the last one to start) can access the volume mount, while all other pods receive Permission denied errors.

Resolution

Workaround

Update the PersistentVolume (PV) object associated with the problematic PVC and add the kernelMountOptions:
- As a first step, back up all the PVC, PVs, and Ceph data affected by this procedure. Although these steps only manipulate OCP API objects in the API server/etcd, it's a best practice to keep this information safe. The available methods for taking this backup are outside this document's scope. For further reference, please review KCS article 5456281 - OpenShift APIs for Data Protection (OADP) FAQ.
- Create a copy of the PersistentVolume:
```
$ oc get pv <pv-name> -o yaml > pv-backup.yaml
```
- Edit the PV and change the field:
```
$ oc edit pv <pv-name>
<extra-output-removed>
  persistentVolumeReclaimPolicy: Delete <-- Modify from Delete to Retain
```
- Verify the persistentVolumeReclaimPolicy is correctly set to Retain. This step prevents OCP from deleting the underlying Ceph volume in case something unexpected occurs:
```
$ oc get pv <pv-name>
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS                VOLUMEATTRIBUTESCLASS   REASON   AGE
pvc-1a205b41-6628-4aac-b142-646a49ffd469   2Gi        RWO            Retain           Bound    [REDACTED]        ocs-storagecluster-cephfs   <unset>                          1y
```
- Edit the file pv-backup.yaml taken and add the following kernelMountOptions line in the volumeattributes section to force a generic SELinux context that all pods can access:
```
$ vim pv-backup.yaml
<extra-output-removed>
    volumeAttributes:
      clusterID: openshift-storage
      fsName: ocs-storagecluster-cephfilesystem
  +   kernelMountOptions: context="system_u:object_r:container_file_t:s0"
      storage.kubernetes.io/csiProvisionerIdentity: 1632867397636-8081-openshift-storage.cephfs.csi.ceph.com
      subvolumeName: csi-vol-9ff22e37-27c8-11ec-9f29-0a580a81020a
      volumeHandle: 0001-0011-openshift-storage-0000000000000001-9ff22e37-27c8-11ec-9f29-0a580a81020a
```
- Delete the existing PV:
```
$ oc delete pv <pv-name>
```
- This deletion process might get stuck in the Terminating status. If so, edit the PV once more and remove the finalizers section:
```
$ oc edit pv <pv-name>
   <extra-output-removed>
   finalizers:
   - kubernetes.io/pv-protection <---remove this line in the pv yaml. 
```
- At this point, the PVC bound to the deleted PV will result in Lost status, i.e., status.phase: Lost. This is expected and doesn't impact the running IO to ODF/OCS: any pod already mounting the affected PV will keep the usual read and write access. Any new pods trying to mount this PV will be in ContainerCreating status until the procedure is finished. After that, they'll run normally.
- Recreate the PV with the modifications done:
```
$ oc create -f pv-backup.yaml   
```
- Remove annotation pv.kubernetes.io/bind-completed from the PVC (not the PV!) This tells OCP to rebind the PV and PVC and resolve the lost phase.
```
$ oc patch -n $PVCNAMESPACE pvc $PVCNAME --type json '-p=[{"op": "remove", "path": "/metadata/annotations/pv.kubernetes.io/bind-completed"}]'
```

Root Cause

The ODF CephFS CSI driver is working as expected.
When a pod mounts the volume, the entire shared directory is relabeled with the unique Multi-Category Security (MCS) label of that specific pod's container process (e.g., s0:c25,c26). When another pod with a different MCS label (e.g., s0:c27,c28) attempted to mount the same volume, the volume was relabeled with another set of unique Multi-Category Security (MCS) labels. The last pod to successfully mount the volume "won," effectively locking out all others. This behavior violates the ReadWriteMany contract, which requires simultaneous access for multiple pods.
The workaround forces a generic, shared SELinux context (container_file_t:s0) on the mount, which does not contain specific MCS categories, making it accessible to any container process.

Diagnostic Steps

Confirm the pods are using a custom SCC with seLinuxContext: type: RunAsAny:

$ oc get pods web-app-749c7476b-4b7bb -o yaml | grep scc
    openshift.io/scc: custom-scc

$ oc get scc custom-scc
NAME         PRIV     CAPS        SELINUX     RUNASUSER        FSGROUP    SUPGROUP   PRIORITY   READONLYROOTFS   VOLUMES
custom-scc   true     ["*"]       RunAsAny    RunAsAny         RunAsAny   RunAsAny    5          false                                    ["*"]

Take rsh into multiple pods from the deployment and try to list the contents of the shared mount path. Note that only one pod succeeds while others fail with Permission denied:

# This pod will fail
$ oc rsh web-app-749c7476b-4b7bb
sh-4.4$ ls -lZ /mnt/shared/testfile
ls: cannot access '/mnt/shared/testfile': Permission denied

# This pod will succeed
$ oc rsh web-app-749c7476b-vlv6p 
sh-4.4$ ls -lZ /mnt/shared/testfile 
-rw-rw-r--. 1 1001 1001 system_u:object_r:container_file_t:s0:c230,c706 0 Jun  9 15:47 /mnt/shared/testfile

Inspect SELinux Labels on a Node:

Debug into the worker node hosting one of the failing pods and find the process ID of the container running in the pod. For this, get the containerID first and then get the pid of the container process:

# Failing pod
$ oc get pods web-app-749c7476b-4b7bb -o json  | jq -r '.status.containerStatuses[].containerID'| tr -d 'crio://-' | cut -c1-10
13f8286f26
$ oc debug node/<node_name> 
s h-5.1# chroot /host
sh-5.1# crictl inspect --output go-template --template '{{.info.pid}}'  13f8286f26
774704

Check the SELinux label of the container process:

        # ps -efZ | grep 774704

        system_u:system_r:container_t:s0**:c91,c480** 1001 774704 18066  0 09:50 ?      00:00:00 httpd -D FOREGROUND=

Find the physical mount path for the PV on the node:

        # Failing pod
        $ oc get pods web-app-749c7476b-4b7bb -o jsonpath='{.metadata.uid}'
        f39de208-4178-438a-956c-b6665bc4aaf9               >> (Referred as pod-UID later)

        $ oc debug node/<node_name>
        sh-5.1# chroot /host
        sh-5.1# df -h | grep f39de208-4178-438a-956c-b6665bc4aaf9 | grep mount     
            <IP>/volumes/csi/csi-vol-<id>  1.0G     0  1.0G   0% /var/lib/kubelet/pods/<pod-UID>/volumes/kubernetes.io~csi/<pv-name>/mount

Check the SELinux label of the volume mount directory.

        # ls -lZd /var/lib/kubelet/pods/<pod-UID>/volumes/kubernetes.io~csi/<pv-name>/mount

        drwxrwsr-x. 2 root 1001 system_u:object_r:container_file_t:s0:**c393,c462** 1 Jun  9 15:47 /var/lib/kubelet/pods/<pod-UID>/volumes/kubernetes.io~csi/<pv-name>/mount/

Confirm the mismatch: The MCS label on the directory (e.g., s0:c393,c462) is different from the failing container's process label (s0:c91,c480).
Repeat the above step on the node with the working pod and see that the MCS label of the container process exactly matches the MCS label on the volume mount directory.

SBR

Product(s)

Red Hat OpenShift Container Platform

Components

storage

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.