Replica 2 for CephFS - Developer Preview OpenShift Data Foundation 4.16

Updated 29 Jul 2024

Overview

To reduce storage overhead with CephFS when data resiliency is not a primary concern, you can opt for Replica 2. This reduces the amount of storage space used and decreases the level of fault tolerance.

There are two ways to use replica-2 for cephFS:

Edit the existing default pool to replica-2 and use it with the default CephFS storageclass.
Add an additional data pool with replica-2 spec and create another storageclass to use it.

Editing the existing default pool to replica-2

Procedure

Patch the storagecluster to change default CephFS data pool to replica-2.

~ $ oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephFilesystems/dataPoolSpec/replicated/size", "value": 2 }]'
storagecluster.ocs.openshift.io/ocs-storagecluster patched

$ oc get cephfilesystem ocs-storagecluster-cephfilesystem -o=jsonpath='{.spec.dataPools}' | jq
[
  {
    "application": "",
    "deviceClass": "ssd",
    "erasureCoded": {
      "codingChunks": 0,
      "dataChunks": 0
    },
    "failureDomain": "zone",
    "mirroring": {},
    "quotas": {},
    "replicated": {
      "replicasPerFailureDomain": 1,
      "size": 2,
      "targetSizeRatio": 0.49
    },
    "statusCheck": {
      "mirror": {}
    }
  }
]

Check the pool details in the tool box pod.

sh-5.1$ ceph osd pool ls | grep filesystem
ocs-storagecluster-cephfilesystem-metadata
ocs-storagecluster-cephfilesystem-data0

sh-5.1$ ceph osd pool ls detail | grep filesystem-data
pool 4 'ocs-storagecluster-cephfilesystem-data0' replicated size 2 min_size 1 crush_rule 3 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 32 flags hashpspool stripe_width 0 target_size_ratio 0.49 application cephfs read_balance_score 2.99

Verify by creating CephFS persistent volume claim (PVC), mounting it on a pod and using the volume.
Create a CephFS PVC and a pod to use it.

~ $ cat <<EOF | oc create -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: normal-cephfs-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: ocs-storagecluster-cephfs  
EOF

~ $ oc get pvc | grep normal
normal-cephfs-pvc                     Bound    pvc-de46cae3-c2eb-4e31-885a-2eaa1bdb62f8   10Gi       RWO            ocs-storagecluster-cephfs     <unset>                 11s

~ % cat <<EOF | oc create -f -             
apiVersion: v1
kind: Pod
metadata:
  name: normal-cephfs-pod
spec:
  nodeSelector:
    topology.kubernetes.io/zone: us-east-1a
  volumes:
    - name: cephfs-storage
      persistentVolumeClaim:
        claimName: normal-cephfs-pvc
  containers:
    - name: nginx-container
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: cephfs-storage
EOF

~ $ oc get pods | grep normal
normal-cephfs-pod                                                 1/1     Running     0          31s

Remotely access the pod using the rsh command and use the mounted volume.

$ oc rsh normal-cephfs-pod
# cd /usr/share/nginx/html
# ls -lh
total 0
# tr -dc "A-Za-z 0-9" < /dev/urandom | fold -w100|head -n 100000000 >file.txt
# ls -lh
total 9.5G
-rw-r--r--. 1 root root 9.5G Jun 26 10:14 file.txt

Check the data distribution.

sh-5.1$ ceph osd df tree
ID   CLASS  WEIGHT   REWEIGHT  SIZE   RAW USE  DATA     OMAP     META     AVAIL    %USE  VAR   PGS  STATUS  TYPE NAME                                           
 -1         6.00000         -  6 TiB   19 GiB   19 GiB    3 KiB  146 MiB  6.0 TiB  0.31  1.00    -          root default                                        
 -5         6.00000         -  6 TiB   19 GiB   19 GiB    3 KiB  146 MiB  6.0 TiB  0.31  1.00    -              region us-east-1                                
-14         2.00000         -  2 TiB  9.5 GiB  9.5 GiB    1 KiB   56 MiB  2.0 TiB  0.47  1.49    -                  zone us-east-1a                             
-13         2.00000         -  2 TiB  9.5 GiB  9.5 GiB    1 KiB   56 MiB  2.0 TiB  0.47  1.49    -                      host ocs-deviceset-gp3-csi-1-data-06cjjm
  2    ssd  2.00000   1.00000  2 TiB  9.5 GiB  9.5 GiB    1 KiB   56 MiB  2.0 TiB  0.47  1.49    4      up                  osd.2                               
-10         2.00000         -  2 TiB  9.5 GiB  9.5 GiB    1 KiB   56 MiB  2.0 TiB  0.47  1.49    -                  zone us-east-1b                             
 -9         2.00000         -  2 TiB  9.5 GiB  9.5 GiB    1 KiB   56 MiB  2.0 TiB  0.47  1.49    -                      host ocs-deviceset-gp3-csi-0-data-0jlrr2
  1    ssd  2.00000   1.00000  2 TiB  9.5 GiB  9.5 GiB    1 KiB   56 MiB  2.0 TiB  0.47  1.49    4      up                  osd.1                               
 -4         2.00000         -  2 TiB   99 MiB   64 MiB    1 KiB   34 MiB  2.0 TiB  0.00  0.02    -                  zone us-east-1c                             
 -3         2.00000         -  2 TiB   99 MiB   64 MiB    1 KiB   34 MiB  2.0 TiB  0.00  0.02    -                      host ocs-deviceset-gp3-csi-2-data-06sv6h
  0    ssd  2.00000   1.00000  2 TiB   99 MiB   64 MiB    1 KiB   34 MiB  2.0 TiB  0.00  0.02    3      up                  osd.0                               
                        TOTAL  6 TiB   19 GiB   19 GiB  4.7 KiB  146 MiB  6.0 TiB  0.31                                                                         
MIN/MAX VAR: 0.02/1.49  STDDEV: 0.22

We can see only 2 copies of the data that is stored on osd.1 and osd.2. No copy is stored in osd.0.

2. Adding an additional data pool with replica-2

Edit the storagecluster spec to add the additional pool.

cephFilesystems:
  additionalDataPools:
  - name: test
    replicated:
    size: 2

$ oc get cephfilesystem ocs-storagecluster-cephfilesystem -o=jsonpath='{.spec.dataPools}' | jq
[
  {
    "application": "",
    "deviceClass": "ssd",
    "erasureCoded": {
      "codingChunks": 0,
      "dataChunks": 0
    },
    "failureDomain": "zone",
    "mirroring": {},
    "quotas": {},
    "replicated": {
      "replicasPerFailureDomain": 1,
      "size": 2,
      "targetSizeRatio": 0.49
    },
    "statusCheck": {
      "mirror": {}
    }
  },
  {
    "application": "",
    "deviceClass": "ssd",
    "erasureCoded": {
      "codingChunks": 0,
      "dataChunks": 0
    },
    "failureDomain": "zone",
    "mirroring": {},
    "name": "test",
    "quotas": {},
    "replicated": {
      "replicasPerFailureDomain": 1,
      "size": 2,
      "targetSizeRatio": 0.49
    },
    "statusCheck": {
      "mirror": {}
    }
  }
]

Check the pool details in the tool box pod.

sh-5.1$ ceph osd pool ls | grep filesystem
ocs-storagecluster-cephfilesystem-metadata
ocs-storagecluster-cephfilesystem-data0
ocs-storagecluster-cephfilesystem-test

sh-5.1$ ceph osd pool ls detail | grep filesystem-test
pool 5 'ocs-storagecluster-cephfilesystem-test' replicated size 2 min_size 1 crush_rule 4 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 38 flags hashpspool stripe_width 0 target_size_ratio 0.49 application cephfs read_balance_score 2.99

Create a storageclass to use the additional data pool.

~ $ cat <<EOF | oc create -f -
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    description: Provides RWO and RWX Filesystem volumes for additional cephfs data pool
  name: ocs-storagecluster-cephfs-additional
parameters:
  clusterID: openshift-storage
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: openshift-storage
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: openshift-storage
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: openshift-storage
  fsName: ocs-storagecluster-cephfilesystem
  pool: ocs-storagecluster-cephfilesystem-test
provisioner: openshift-storage.cephfs.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
EOF

 ~ $ oc get storageclass | grep cephfs
ocs-storagecluster-cephfs              openshift-storage.cephfs.csi.ceph.com   Delete          Immediate              true                   71m
ocs-storagecluster-cephfs-additional   openshift-storage.cephfs.csi.ceph.com   Delete          Immediate              true                   57s

Verify by creating CephFS PVC, mounting it in a pod, and using the volume.

Create a CephFS PVC and a pod to use the PVC.

~ $ cat <<EOF | oc create -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: normal-cephfs-pvc-additional
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: ocs-storagecluster-cephfs-additional
EOF

~ $ oc get pvc | grep normal
normal-cephfs-pvc                     Bound    pvc-de46cae3-c2eb-4e31-885a-2eaa1bdb62f8   10Gi       RWO            ocs-storagecluster-cephfs     <unset>                 23m
normal-cephfs-pvc-additional          Bound    pvc-e5756895-54e3-4932-bcca-6e4fd6c1fac5   10Gi       RWO            ocs-storagecluster-cephfs     <unset>                 95s

~ % cat <<EOF | oc create -f -             
apiVersion: v1
kind: Pod
metadata:
  name: normal-cephfs-pod-additional
spec:
  nodeSelector:
    topology.kubernetes.io/zone: us-east-1a
  volumes:
    - name: cephfs-storage
      persistentVolumeClaim:
        claimName: normal-cephfs-pvc-additional
  containers:
    - name: nginx-container
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: cephfs-storage
EOF

~ $ oc get pods | grep normal
normal-cephfs-pod                                                 1/1     Running     0             23m
normal-cephfs-pod-additional                                      1/1     Running     0             14s

Remotely access the pod using the rsh command and use the mounted volume.

 ~ $ oc rsh normal-cephfs-pod-additional
# cd /usr/share/nginx/html
# ls -lh
total 0
# tr -dc "A-Za-z 0-9" < /dev/urandom | fold -w100|head -n 100000000 >file.txt
# ls -lh
total 9.5G
-rw-r--r--. 1 root root 9.5G Jun 26 11:00 file.txt

Check data distribution.

sh-5.1$ ceph osd df tree
ID   CLASS  WEIGHT   REWEIGHT  SIZE   RAW USE  DATA     OMAP     META     AVAIL    %USE  VAR   PGS  STATUS  TYPE NAME                                           
 -1         6.00000         -  6 TiB   38 GiB   38 GiB    3 KiB  258 MiB  6.0 TiB  0.62  1.00    -          root default                                        
 -5         6.00000         -  6 TiB   38 GiB   38 GiB    3 KiB  258 MiB  6.0 TiB  0.62  1.00    -              region us-east-1                                
-14         2.00000         -  2 TiB  9.6 GiB  9.5 GiB    1 KiB   74 MiB  2.0 TiB  0.47  0.75    -                  zone us-east-1a                             
-13         2.00000         -  2 TiB  9.6 GiB  9.5 GiB    1 KiB   74 MiB  2.0 TiB  0.47  0.75    -                      host ocs-deviceset-gp3-csi-1-data-06cjjm
  2    ssd  2.00000   1.00000  2 TiB  9.6 GiB  9.5 GiB    1 KiB   74 MiB  2.0 TiB  0.47  0.75    4      up                  osd.2                               
-10         2.00000         -  2 TiB   19 GiB   19 GiB    1 KiB  132 MiB  2.0 TiB  0.93  1.50    -                  zone us-east-1b                             
 -9         2.00000         -  2 TiB   19 GiB   19 GiB    1 KiB  132 MiB  2.0 TiB  0.93  1.50    -                      host ocs-deviceset-gp3-csi-0-data-0jlrr2
  1    ssd  2.00000   1.00000  2 TiB   19 GiB   19 GiB    1 KiB  132 MiB  2.0 TiB  0.93  1.50    5      up                  osd.1                               
 -4         2.00000         -  2 TiB  9.5 GiB  9.5 GiB    1 KiB   52 MiB  2.0 TiB  0.47  0.75    -                  zone us-east-1c                             
 -3         2.00000         -  2 TiB  9.5 GiB  9.5 GiB    1 KiB   52 MiB  2.0 TiB  0.47  0.75    -                      host ocs-deviceset-gp3-csi-2-data-06sv6h
  0    ssd  2.00000   1.00000  2 TiB  9.5 GiB  9.5 GiB    1 KiB   52 MiB  2.0 TiB  0.47  0.75    4      up                  osd.0                               
                        TOTAL  6 TiB   38 GiB   38 GiB  4.7 KiB  258 MiB  6.0 TiB  0.62                                                                         
MIN/MAX VAR: 0.75/1.50  STDDEV: 0.22

From the size increase of the OSDs, only 2 copies of the data have been stored on osd.0 and osd.1. No copy is stored in osd.2.

Product(s)

Red Hat OpenShift Data Foundation

Article Type

General