OpenShift Data Foundation external mode support for RBD/block Replica-1 pools and topology awareness - Developer preview ODF 4.16
Overview
Applications like Kafka will have a deployment with multiple running instances. Each service instance will create a new claim and is expected to be located in a different zone. Since the application has its own redundant instances, there is no requirement for redundancy at the data layer. A storage class is created that will provision storage from replica 1 Ceph pools that are located in each of the separate zones.
Configuration flags
You can add the required flags to the script: create-external-cluster-resources.py:
| Flags | Description |
|---|---|
--topology-pools | (optional) Comma-separated list of topology-constrained rbd pools |
--topology-failure-domain-label | (optional) K8s cluster failure domain label (example: zone, rack, or host) for the topology-pools that match the ceph domain |
--topology-failure-domain-values | (optional) Comma-separated list of the k8s cluster failure domain values corresponding to each of the pools in the topology-pools list |
OCS Operator will then create a new storage class named ceph-rbd-topology.
Example configuration
Ceph cluster
-
Determine the names of the zones (or other failure domains) in the Ceph CRUSH map where each of the pools will have corresponding CRUSH rules. For more information about CRUSH, see CRUSH location.
-
Create a zone-specific CRUSH rule for each of the pools. For example, this is a CRUSH rule for
zone-a:
$ ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class>
{
"rule_id": 5,
"rule_name": "rule_host-zone-a-hdd",
"type": 1,
"steps": [
{
"op": "take",
"item": -10,
"item_name": "zone-a~hdd"
},
{
"op": "choose_firstn",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
}
-
Create replica-1 pools based on each of the CRUSH rules from the previous step. Each pool must be created with a CRUSH rule to limit the pool to OSDs in a specific zone.
WARNING: Disable the Ceph warning for replica-1 pools:
ceph config set global mon_allow_pool_size_one true -
Determine the zones in the K8s cluster that correspond to each of the pools in the Ceph pool. The K8s nodes require labels as defined with the OSD Topology labels. Some environments already have nodes labeled in zones.
-
Set the topology labels on the nodes if not already present.
-
Set the flags of the external cluster configuration script based on the pools and failure domains.
-
--topology-pools=pool-a,pool-b,pool-c -
--topology-failure-domain-label=zone -
--topology-failure-domain-values=zone-a,zone-b,zone-c
-
-
Run the python script to generate the settings which will be imported to the Rook cluster:
python3 create-external-cluster-resources.py --rbd-data-pool-name replicapool --topology-pools pool-a,pool-b,pool-c --topology-failure-domain-label zone --topology-failure-domain-values zone-a,zone-b,zone-c
Output:
[{"name": "rook-ceph-mon-endpoints", "kind": "ConfigMap", "data": {"data": "osd-1=10.1.112.32:6789", "maxMonId": "0", "mapping": "{}"}}, {"name": "rook-ceph-mon", "kind": "Secret", "data": {"admin-secret": "admin-secret", "fsid": "8f01d842-d4b2-11ee-b43c-0050568fb522", "mon-secret": "mon-secret"}}, {"name": "rook-ceph-operator-creds", "kind": "Secret", "data": {"userID": "client.healthchecker", "userKey": "AQDyAd9lfiDOLxAAcvvOwLW6HESKqzjHUIrhwg=="}}, {"name": "monitoring-endpoint", "kind": "CephCluster", "data": {"MonitoringEndpoint": "10.1.112.46", "MonitoringPort": "9283"}}, {"name": "rook-csi-rbd-node", "kind": "Secret", "data": {"userID": "csi-rbd-node", "userKey": "AQDyAd9lNTDAMBAA3iXWRho6V0YYdP1Xgm++tQ=="}}, {"name": "rook-csi-rbd-provisioner", "kind": "Secret", "data": {"userID": "csi-rbd-provisioner", "userKey": "AQDyAd9l+z+vMRAAoTJpMnNqhrAvA6x4Xw038g=="}}, {"name": "rook-csi-cephfs-provisioner", "kind": "Secret", "data": {"adminID": "csi-cephfs-provisioner", "adminKey": "AQDyAd9lK1ykMxAA3iWWyYmm4omJTXDzLegtDw=="}}, {"name": "rook-csi-cephfs-node", "kind": "Secret", "data": {"adminID": "csi-cephfs-node", "adminKey": "AQDyAd9l6MG5MhAAf+hno2rMc3S8de0AoVsiGQ=="}}, {"name": "rook-ceph-dashboard-link", "kind": "Secret", "data": {"userID": "ceph-dashboard-link", "userKey": "https://10.1.112.46:8443/"}}, {"name": "ceph-rbd", "kind": "StorageClass", "data": {"pool": "nitin", "csi.storage.k8s.io/provisioner-secret-name": "rook-csi-rbd-provisioner", "csi.storage.k8s.io/controller-expand-secret-name": "rook-csi-rbd-provisioner", "csi.storage.k8s.io/node-stage-secret-name": "rook-csi-rbd-node"}}, {"name": "ceph-rbd-topology-storageclass", "kind": "StorageClass", "data": {"topologyFailureDomainLabel": "zone", "topologyFailureDomainValues": ["zone-a", "zone-b", "zone-c"], "topologyPools": ["pool-a", "pool-b", "pool-c"], "csi.storage.k8s.io/provisioner-secret-name": "rook-csi-rbd-provisioner", "csi.storage.k8s.io/controller-expand-secret-name": "rook-csi-rbd-provisioner", "csi.storage.k8s.io/node-stage-secret-name": "rook-csi-rbd-node"}}, {"name": "cephfs", "kind": "StorageClass", "data": {"fsName": "fsvol001", "pool": "cephfs.fsvol001.data", "csi.storage.k8s.io/provisioner-secret-name": "rook-csi-cephfs-provisioner", "csi.storage.k8s.io/controller-expand-secret-name": "rook-csi-cephfs-provisioner", "csi.storage.k8s.io/node-stage-secret-name": "rook-csi-cephfs-node"}}]
Kubernetes Cluster
- Check the external cluster is created and connected as per the installation steps. Review the new storage class:
$ kubectl get sc ceph-rbd-topology -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: "2024-03-07T12:10:19Z"
name: ceph-rbd-topology
resourceVersion: "82502"
uid: 68448a14-3a78-42c5-ac29-261b6c3404af
parameters:
...
...
topologyConstrainedPools: |
[
{"poolName":"pool-a",
"domainSegments":[
{"domainLabel":"zone","value":"zone-a"}]},
{"poolName":"pool-b",
"domainSegments":[
{"domainLabel":"zone","value":"zone-b"}]},
{"poolName":"pool-c",
"domainSegments":[
{"domainLabel":"zone","value":"zone-c"}]},
]
provisioner: rook-ceph.rbd.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
For ODF 4.19 and above also add topology label to csi drivers
Update the csi drivers with toplogy labels used above (for eg: failureDomain: zone):
$ oc -n openshift-storage edit driver openshift-storage.rbd.csi.ceph.com
And add below syntax under the spec and replace the failureDomain with the one we assigned in earlier steps.
spec:
nodePlugin:
topology:
domainLabels: ['failureDomain',]
Create a Topology-Based PVC
The topology-based storage class is ready to be consumed. Create a PVC from the ceph-rbd-topology storage class above, and watch the OSD usage to see how the data is spread only among the topology-based CRUSH buckets.