How to share the same volume between namespaces using ODF 4
Environment
- Red Hat OpenShift Container Platform (OCP) Red Hat OpenShift Data Foundation (ODF)
- 4.9
- 4.10
- Red Hat OpenShift Container Platform (OCP) Red Hat OpenShift Container Storage (OCS)
- 4.8
Issue
- For an application that has parts that reside in different namespaces and that needs to consume the same shared volume, normally an NFS or CIFS server external to OCP is used. However, in an OCP cluster that has ODF, is it possible to use ODF to share the same volume for this application between different namespaces?
- If such sharing is possible using ODF, what are the implications of this?
Resolution
Disclaimer: This configuration should not be used without careful analysis. There are better ways to deal with applications that consume the same volume, like S3. Keeping the application in the same namespace is simpler than the configuration described in this article. The information is provided as-is without representations or warranties about the suitability or accuracy of the information provided. The intent is to provide information to accomplish the system's needs. Use of the information below is at the user's own risk.
Important: On clusters from OCP 4.8 to 4.10, it is recommended to define a pod anti-affinity to keep the pods from different namespaces with the same CephFS volume from running at the same node.
This article will break down the necessary steps into five different steps as shown below:
1. Storage Class
A dedicated storage class with the reclaimPolicy as Retain and VolumeBindingMode as WaitForFirstConsumer is recommended. This is not mandatory, but it is a way to prevent the volume deletion by accident or the accidental bound between the PVC with an existent or automatic created PV. Some values could be different depending on the cluster, so check the fields from the storage class ocs-storagecluster-cephfs on the cluster:
# echo "
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: manual-shared-cephfs-volume
provisioner: openshift-storage.cephfs.csi.ceph.com
parameters:
csi.storage.k8s.io/provisioner-secret-namespace: openshift-storage
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
mounter: fuse
fsName: ocs-storagecluster-cephfilesystem
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
clusterID: openshift-storage
csi.storage.k8s.io/controller-expand-secret-namespace: openshift-storage
csi.storage.k8s.io/node-stage-secret-namespace: openshift-storage
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer " | oc create -f -
2. Creating the namespace, deployment and other related objects
2.1. Two namespaces will be used in this example:
# oc new-project test-01
# oc new-project test-02
2.2. Collect the sa.scc.mcs and scc.supplemental-groups from this namespace. These values will later be used for the SCC and securityContext:
# oc describe project test-01 | egrep 'sa.scc.mcs|scc.supplemental-groups'
openshift.io/sa.scc.mcs=s0:c27,c24
openshift.io/sa.scc.supplemental-groups=1000750000/10000
2.3. Create the application:
# oc new-app --name deploy-01 registry.access.redhat.com/rhscl/httpd-24-rhel7 -n test-01
3. PV and PVC
3.1. This first PVC could be requested in any available way since it is created dynamically:
# oc project test-01
# oc set volumes deployment/deploy-01 --claim-name base-pvc-for-share --claim-size=1G --claim-class=manual-shared-cephfs-volume --claim-mode=rwm --mount-path /opt/rh/httpd24/root/var/www/html --add -n test-01
3.2. Check if the PVC is bounded to the PV and apply the label manual-shared-cephfs-volume with the name of the PV as a value:
# oc get pvc base-pvc-for-share -n test-01 -w
# PV_NAME=$(oc get pvc base-pvc-for-share -n test-01 -o jsonpath='{.spec.volumeName}')
3.3. Label the PV, PVC and deployment for identification. Again, this is not required for this configuration to work, but will simplify the administration:
# oc label pv $PV_NAME manual-shared-cephfs-volume=SOURCE
# oc label pvc base-pvc-for-share manual-shared-cephfs-volume=$PV_NAME
# oc label deployment/deploy-01 manual-shared-cephfs-volume=$PV_NAME
3.4. Extract the PV's yaml. Using grep -v will help to remove some unnecessary fields from the yaml:
# oc get pv $PV_NAME -o yaml | egrep -v " uid:| resourceVersion:| creationTimestamp:| kubernetes.io/pv-protection| pv.kubernetes.io" > PV-test-02.yaml
3.5. The following fields must be updated:
apiVersion: v1
kind: PersistentVolume
metadata:
name: NEW_NAME_FOR_PV ## Using cephfs-test-02
labels:
manual-shared-cephfs-volume: $PV_NAME ## Name of the original PV
spec:
claimRef:
name: NAME_NEW_PVC ## Using cephfs-test-02
namespace: NAMESPACE_NEW_PVC ## Using test-02
csi:
.....
storageClassName: manual-shared-cephfs-volume
persistentVolumeReclaimPolicy: Retain
...
3.6. Create the PV. The status should be Available until the creation of the PVC:
# oc create -f PV-test-02.yaml
# oc get pv -l manual-shared-cephfs-volume=$PV_NAME -w
3.7. Now proceed with the creation of the second namespace. Create the application and PVC for the new PV:
# oc new-app --name deploy-02 registry.access.redhat.com/rhscl/httpd-24-rhel7 --labels manual-shared-cephfs-volume=True -n test-02
# oc set volumes deployment/deploy-02 --claim-name cephfs-test-02 --claim-size=1G --claim-class=manual-shared-cephfs-volume --claim-mode=rwm --mount-path /opt/rh/httpd24/root/var/www/html --add -n test-02
3.8. Since the PV already has the related PVC and namespace defined, the status of the PV and PVC should change to Bound:
# oc get pv,pvc -l manual-shared-cephfs-volume=$PV_NAME
4. Security Context
4.1. The shared configuration is done, but the permissions must be adjusted. In order to do so, a custom SCC and serviceaccount must be created. The deploy (of any kind: deployment, deploymentconfig, cronjob, etc.) must be set to use the serviceaccount created. The SCC allows the usage of any seLinuxContext and supplementalGroups (sa.scc.supplemental-groups) from the namespace of the first PVC, in this example test-01. This SCC also restricts the execution to no-root users. If the usage of root is necessary, this must be adjust:
# echo "
apiVersion: security.openshift.io/v1
fsGroup:
type: RunAsAny
kind: SecurityContextConstraints
metadata:
name: cephfs-share-test-01
priority: null
readOnlyRootFilesystem: false
AllowPrivileged: false
requiredDropCapabilities: null
seLinuxContext:
type: RunAsAny
runAsUser:
type: MustRunAsNonRoot
supplementalGroups:
type: MustRunAs
ranges:
- min: 1000750000
max: 1000750001
seccompProfiles:
- '*'
users:
- system:serviceaccount:test-02:cephfs-share-0-test-01
volumes:
- '*' " | oc create -f -
4.2. Create the serviceaccount:
# oc create serviceaccount cephfs-share-0-test-01 -n test-02
# oc adm policy add-scc-to-user cephfs-share-test-01 system:serviceaccount:test-02:cephfs-share-0-test-01
# oc set serviceaccount deployment/deploy-02 cephfs-share-0-test-01 -n test-02
4.3. Set the deploy's securityContext:
kind: Deployment
apiVersion: apps/v1
spec:
....
template:
...
spec:
...
securityContext:
fsGroup: 1000750000
seLinuxOptions:
level: 's0:c27,c24'
...
5. Listing the objects
Listing the objects relate to the original PV:
# oc get pod,deployment,pv,pvc -A -l manual-shared-cephfs-volume=$PV_NAME
This is the last step and the sharing is done.
Related Articles:
How to share the same CephFS volume across two external ODF clusters ?
Root Cause
Why it is necessary to implement an anti-affinity on cluster before the 4.11 version? When pods from different namespaces with PV/PVC that points toward the same CephFS volume try to mount on a node with this volume in use, the last pod to start is not able to mount the volume. When a pod trying to mount a volume fails, due the use of the same volume, it will block any other pod to mount on the same node until the pod is removed from the node.
The implementation of anti-affinity is a workaround that allows the pods to be scheduled on different nodes and be able to mount the necessary volumes. This workaround is not effective with daemonsets that are scheduled on all nodes. This anti-affinity is not necessary on clusters installed or updated to 4.11 or newer.
The table below shows the combinations and its results for pods running on the same node:
| NAMESPACE AND POD | OK ON SAME NODE | FAIL ON SAME NODE |
|---|---|---|
| NAMESPACE-A ANY_POD VOLUME-A | NAMESPACE-A ANY_POD VOLUME-A NAMESPACE-A ANY_POD VOL-NOT-A NAMESPACE-ANY ANY_POD VOLUME-B | NAMESPACE-B ANY_POD VOLUME-A NAMESPACE-NOT-A ANY_POD VOLUME-A |
| NAMESPACE-B ANY_POD VOLUME-B | NAMESPACE-A ANY_POD VOL-NOT-B NAMESPACE-B ANY_POD VOLUME-B NAMESPACE-ANY ANY_POD VOL-NOT-B | NAMESPACE-A ANY_POD VOLUME-B NAMESPACE-NOT-B ANY_POD VOLUME-B |
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.