Replica 2 for CephFS - Developer Preview OpenShift Data Foundation 4.16
Updated
Overview
To reduce storage overhead with CephFS when data resiliency is not a primary concern, you can opt for Replica 2. This reduces the amount of storage space used and decreases the level of fault tolerance.
There are two ways to use replica-2 for cephFS:
- Edit the existing default pool to replica-2 and use it with the default CephFS storageclass.
- Add an additional data pool with replica-2 spec and create another storageclass to use it.
Editing the existing default pool to replica-2
Procedure
- Patch the storagecluster to change default CephFS data pool to replica-2.
~ $ oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephFilesystems/dataPoolSpec/replicated/size", "value": 2 }]'
storagecluster.ocs.openshift.io/ocs-storagecluster patched
$ oc get cephfilesystem ocs-storagecluster-cephfilesystem -o=jsonpath='{.spec.dataPools}' | jq
[
{
"application": "",
"deviceClass": "ssd",
"erasureCoded": {
"codingChunks": 0,
"dataChunks": 0
},
"failureDomain": "zone",
"mirroring": {},
"quotas": {},
"replicated": {
"replicasPerFailureDomain": 1,
"size": 2,
"targetSizeRatio": 0.49
},
"statusCheck": {
"mirror": {}
}
}
]
- Check the pool details in the tool box pod.
sh-5.1$ ceph osd pool ls | grep filesystem
ocs-storagecluster-cephfilesystem-metadata
ocs-storagecluster-cephfilesystem-data0
sh-5.1$ ceph osd pool ls detail | grep filesystem-data
pool 4 'ocs-storagecluster-cephfilesystem-data0' replicated size 2 min_size 1 crush_rule 3 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 32 flags hashpspool stripe_width 0 target_size_ratio 0.49 application cephfs read_balance_score 2.99
-
Verify by creating CephFS persistent volume claim (PVC), mounting it on a pod and using the volume.
-
Create a CephFS PVC and a pod to use it.
~ $ cat <<EOF | oc create -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: normal-cephfs-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ocs-storagecluster-cephfs
EOF
~ $ oc get pvc | grep normal
normal-cephfs-pvc Bound pvc-de46cae3-c2eb-4e31-885a-2eaa1bdb62f8 10Gi RWO ocs-storagecluster-cephfs <unset> 11s
~ % cat <<EOF | oc create -f -
apiVersion: v1
kind: Pod
metadata:
name: normal-cephfs-pod
spec:
nodeSelector:
topology.kubernetes.io/zone: us-east-1a
volumes:
- name: cephfs-storage
persistentVolumeClaim:
claimName: normal-cephfs-pvc
containers:
- name: nginx-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: cephfs-storage
EOF
~ $ oc get pods | grep normal
normal-cephfs-pod 1/1 Running 0 31s
- Remotely access the pod using the
rshcommand and use the mounted volume.
$ oc rsh normal-cephfs-pod
# cd /usr/share/nginx/html
# ls -lh
total 0
# tr -dc "A-Za-z 0-9" < /dev/urandom | fold -w100|head -n 100000000 >file.txt
# ls -lh
total 9.5G
-rw-r--r--. 1 root root 9.5G Jun 26 10:14 file.txt
- Check the data distribution.
sh-5.1$ ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 6.00000 - 6 TiB 19 GiB 19 GiB 3 KiB 146 MiB 6.0 TiB 0.31 1.00 - root default
-5 6.00000 - 6 TiB 19 GiB 19 GiB 3 KiB 146 MiB 6.0 TiB 0.31 1.00 - region us-east-1
-14 2.00000 - 2 TiB 9.5 GiB 9.5 GiB 1 KiB 56 MiB 2.0 TiB 0.47 1.49 - zone us-east-1a
-13 2.00000 - 2 TiB 9.5 GiB 9.5 GiB 1 KiB 56 MiB 2.0 TiB 0.47 1.49 - host ocs-deviceset-gp3-csi-1-data-06cjjm
2 ssd 2.00000 1.00000 2 TiB 9.5 GiB 9.5 GiB 1 KiB 56 MiB 2.0 TiB 0.47 1.49 4 up osd.2
-10 2.00000 - 2 TiB 9.5 GiB 9.5 GiB 1 KiB 56 MiB 2.0 TiB 0.47 1.49 - zone us-east-1b
-9 2.00000 - 2 TiB 9.5 GiB 9.5 GiB 1 KiB 56 MiB 2.0 TiB 0.47 1.49 - host ocs-deviceset-gp3-csi-0-data-0jlrr2
1 ssd 2.00000 1.00000 2 TiB 9.5 GiB 9.5 GiB 1 KiB 56 MiB 2.0 TiB 0.47 1.49 4 up osd.1
-4 2.00000 - 2 TiB 99 MiB 64 MiB 1 KiB 34 MiB 2.0 TiB 0.00 0.02 - zone us-east-1c
-3 2.00000 - 2 TiB 99 MiB 64 MiB 1 KiB 34 MiB 2.0 TiB 0.00 0.02 - host ocs-deviceset-gp3-csi-2-data-06sv6h
0 ssd 2.00000 1.00000 2 TiB 99 MiB 64 MiB 1 KiB 34 MiB 2.0 TiB 0.00 0.02 3 up osd.0
TOTAL 6 TiB 19 GiB 19 GiB 4.7 KiB 146 MiB 6.0 TiB 0.31
MIN/MAX VAR: 0.02/1.49 STDDEV: 0.22
We can see only 2 copies of the data that is stored on osd.1 and osd.2. No copy is stored in osd.0.
2. Adding an additional data pool with replica-2
- Edit the storagecluster spec to add the additional pool.
cephFilesystems:
additionalDataPools:
- name: test
replicated:
size: 2
$ oc get cephfilesystem ocs-storagecluster-cephfilesystem -o=jsonpath='{.spec.dataPools}' | jq
[
{
"application": "",
"deviceClass": "ssd",
"erasureCoded": {
"codingChunks": 0,
"dataChunks": 0
},
"failureDomain": "zone",
"mirroring": {},
"quotas": {},
"replicated": {
"replicasPerFailureDomain": 1,
"size": 2,
"targetSizeRatio": 0.49
},
"statusCheck": {
"mirror": {}
}
},
{
"application": "",
"deviceClass": "ssd",
"erasureCoded": {
"codingChunks": 0,
"dataChunks": 0
},
"failureDomain": "zone",
"mirroring": {},
"name": "test",
"quotas": {},
"replicated": {
"replicasPerFailureDomain": 1,
"size": 2,
"targetSizeRatio": 0.49
},
"statusCheck": {
"mirror": {}
}
}
]
- Check the pool details in the tool box pod.
sh-5.1$ ceph osd pool ls | grep filesystem
ocs-storagecluster-cephfilesystem-metadata
ocs-storagecluster-cephfilesystem-data0
ocs-storagecluster-cephfilesystem-test
sh-5.1$ ceph osd pool ls detail | grep filesystem-test
pool 5 'ocs-storagecluster-cephfilesystem-test' replicated size 2 min_size 1 crush_rule 4 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 38 flags hashpspool stripe_width 0 target_size_ratio 0.49 application cephfs read_balance_score 2.99
- Create a storageclass to use the additional data pool.
~ $ cat <<EOF | oc create -f -
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
description: Provides RWO and RWX Filesystem volumes for additional cephfs data pool
name: ocs-storagecluster-cephfs-additional
parameters:
clusterID: openshift-storage
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: openshift-storage
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: openshift-storage
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: openshift-storage
fsName: ocs-storagecluster-cephfilesystem
pool: ocs-storagecluster-cephfilesystem-test
provisioner: openshift-storage.cephfs.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
EOF
~ $ oc get storageclass | grep cephfs
ocs-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 71m
ocs-storagecluster-cephfs-additional openshift-storage.cephfs.csi.ceph.com Delete Immediate true 57s
- Verify by creating CephFS PVC, mounting it in a pod, and using the volume.
- Create a CephFS PVC and a pod to use the PVC.
~ $ cat <<EOF | oc create -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: normal-cephfs-pvc-additional
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ocs-storagecluster-cephfs-additional
EOF
~ $ oc get pvc | grep normal
normal-cephfs-pvc Bound pvc-de46cae3-c2eb-4e31-885a-2eaa1bdb62f8 10Gi RWO ocs-storagecluster-cephfs <unset> 23m
normal-cephfs-pvc-additional Bound pvc-e5756895-54e3-4932-bcca-6e4fd6c1fac5 10Gi RWO ocs-storagecluster-cephfs <unset> 95s
~ % cat <<EOF | oc create -f -
apiVersion: v1
kind: Pod
metadata:
name: normal-cephfs-pod-additional
spec:
nodeSelector:
topology.kubernetes.io/zone: us-east-1a
volumes:
- name: cephfs-storage
persistentVolumeClaim:
claimName: normal-cephfs-pvc-additional
containers:
- name: nginx-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: cephfs-storage
EOF
~ $ oc get pods | grep normal
normal-cephfs-pod 1/1 Running 0 23m
normal-cephfs-pod-additional 1/1 Running 0 14s
- Remotely access the pod using the
rshcommand and use the mounted volume.
~ $ oc rsh normal-cephfs-pod-additional
# cd /usr/share/nginx/html
# ls -lh
total 0
# tr -dc "A-Za-z 0-9" < /dev/urandom | fold -w100|head -n 100000000 >file.txt
# ls -lh
total 9.5G
-rw-r--r--. 1 root root 9.5G Jun 26 11:00 file.txt
- Check data distribution.
sh-5.1$ ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 6.00000 - 6 TiB 38 GiB 38 GiB 3 KiB 258 MiB 6.0 TiB 0.62 1.00 - root default
-5 6.00000 - 6 TiB 38 GiB 38 GiB 3 KiB 258 MiB 6.0 TiB 0.62 1.00 - region us-east-1
-14 2.00000 - 2 TiB 9.6 GiB 9.5 GiB 1 KiB 74 MiB 2.0 TiB 0.47 0.75 - zone us-east-1a
-13 2.00000 - 2 TiB 9.6 GiB 9.5 GiB 1 KiB 74 MiB 2.0 TiB 0.47 0.75 - host ocs-deviceset-gp3-csi-1-data-06cjjm
2 ssd 2.00000 1.00000 2 TiB 9.6 GiB 9.5 GiB 1 KiB 74 MiB 2.0 TiB 0.47 0.75 4 up osd.2
-10 2.00000 - 2 TiB 19 GiB 19 GiB 1 KiB 132 MiB 2.0 TiB 0.93 1.50 - zone us-east-1b
-9 2.00000 - 2 TiB 19 GiB 19 GiB 1 KiB 132 MiB 2.0 TiB 0.93 1.50 - host ocs-deviceset-gp3-csi-0-data-0jlrr2
1 ssd 2.00000 1.00000 2 TiB 19 GiB 19 GiB 1 KiB 132 MiB 2.0 TiB 0.93 1.50 5 up osd.1
-4 2.00000 - 2 TiB 9.5 GiB 9.5 GiB 1 KiB 52 MiB 2.0 TiB 0.47 0.75 - zone us-east-1c
-3 2.00000 - 2 TiB 9.5 GiB 9.5 GiB 1 KiB 52 MiB 2.0 TiB 0.47 0.75 - host ocs-deviceset-gp3-csi-2-data-06sv6h
0 ssd 2.00000 1.00000 2 TiB 9.5 GiB 9.5 GiB 1 KiB 52 MiB 2.0 TiB 0.47 0.75 4 up osd.0
TOTAL 6 TiB 38 GiB 38 GiB 4.7 KiB 258 MiB 6.0 TiB 0.62
MIN/MAX VAR: 0.75/1.50 STDDEV: 0.22
From the size increase of the OSDs, only 2 copies of the data have been stored on osd.0 and osd.1. No copy is stored in osd.2.
Product(s)
Article Type