Deploying OpenShift Data Foundation on a Two‑Node OpenShift Cluster with Fencing and DRBD - OpenShift Data Foundation 4.21 Developer Preview

Updated

Important: A developer preview feature is subject to Developer preview support limitations. Developer preview features are not intended to be run in production environments. The clusters deployed with the developer preview features are considered to be development clusters and are not supported through the Red Hat Customer Portal case management system. Development Preview features are meant for customers who are willing to evaluate new products or releases of products in an early stage of product development. If you need assistance with developer preview features, reach out to the ocs-devpreview@redhat.com mailing list and a member of the Red Hat Development Team will assist you as quickly as possible based on availability and work schedules. To know more about the support scope refer to the This content is not included.KCS.

Overview

This article describes the installation and configuration of OpenShift Data Foundation (ODF) on a two‑node OpenShift cluster with fencing enabled DOC. This architecture is suitable for distributed and edge deployments, where the goal is to minimize hardware footprint while maintaining High Availability (HA).

Target Environment: Bare metal deployments

Key Capability: Uses DRBD (Distributed Replicated Block Device) and fencing to ensure data consistency and durability across two nodes.

Prerequisites

Cluster requirements

  • Two‑Node OpenShift Cluster with Fencing Enabled
  • Verify the cluster topology:
oc get infrastructure cluster -o jsonpath='{.status.controlPlaneTopology}

Expected output:
DualReplica

Network Requirements

  • Port 7794 must be open and reachable between both nodes for DRBD replication.
  • Run from each node:
nc -zv <peer-node-ip> 7794

Expected output (reachable but closed):

Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Connection refused.

Storage Requirements (per node)

  • OSD Disks

    • One or more disks
    • Minimum size: 500 GB
    • Sizes must match across nodes
  • Floating Monitor Disk

    • One small disk or partition
    • Minimum: 10 GB (recommended < 50 GB)
  • List disks:

lsblk

Procedure

Configuring DRBD

  1. Configure DRBD to set up the block device replication required for the floating monitor.
 # Run the script from the shell, where we have kube api and oc access

 # --floating-mon-disk provide the device name(present on both nodes)

 bash configure-drbd.sh `--floating-mon-disk /dev/sda`

Use the configure-drbd.sh shell script attached to this article.

  1. Verify if DRBD is configured.
oc debug node/"$Node" -- chroot /host sudo podman run --rm --privileged -v /dev:/dev -v  /etc/drbd.conf:/etc/drbd.conf - v /etc/drbd.d:/etc/drbd.d --net host --hostname "$Node" quay.io/rhceph-dev/odf4-drbd-rhel9:v4.21.0-1 drbdadm status 
#Output
r0 role:Secondary
 disk:UpToDate
 master-2 role:Secondary
   peer-disk:UpToDate

Creating PV

  1. Create PV manually for the OSDs on both the nodes.

a. Create the storageclass.

oc create -f  `lso-storageclass.yml`

Use the lso-storageclass.yml shell script attached to this article.

b. Edit pv.yaml to match your device path (for example, /dev/disk/by-id/wwn-0x58ce38ee2311e971) by disk-by-id, storage(for example, storage: 100Gi), and nodeAffinity(for example, values: - master-1).

oc create -f pv.yml 

Use the pv.yml shell script attached to this article.

  1. Verify the creation of PV.
oc get pv 

NOTE: If you are running OpenShift Container Platform with RHEL KVM, you must assign a serial number to your VM disk. Otherwise, the VM disk cannot be identified after reboot. You can use the virsh edit <VM> command to add the <serial>mydisk</serial> definition.

Installing OpenShift Data Foundation

  1. Install the odf operator. For information, see OpenShift Data Foundation deployment documentation.
oc create -f odf.yml

Use the odf.yml shell script attached to this article.

  1. Label Nodes to designate the nodes for ODF storage usage.
oc label nodes <node1> <node2> cluster.ocs.openshift.io/openshift-storage=""
  1. Verify Nodes are labeled correctly
oc get nodes --show-labels | grep openshift-storage
  1. Deploy the service that manages the floating Ceph monitor ( Floating monitor deployment).
    a. Edit mon-deployment.sh to change the Ceph image to the downstream image.
# Run the script from the shell, where there is kube api and oc access
 
bash mon-deployment.sh

Use the mon-deployment.sh shell script attached to this article.
b. Verify the deployment

# mon pod will be in pending state until Step 3.4 is performed
oc get deployment | grep rook-ceph-mon-c
oc get svc -n openshift-storage
  1. Create StorageCluster.

a. Instantiate the storage cluster.

oc create -f storagecluster.yaml

Use the storagecluster.yml shell script attached to this article.

b. Verify the storage cluster.

oc get storagecluster -n openshift-storage

Wait for status Ready.

Post-Installation Tuning

  1. Optimize resource consumption for the two-node environment.
 # Run the script from the shell, where we have kube api and oc access

 bash update-csi-resources.sh

Use the update-csi-resources.sh shell script attached to this article.

Application support and failover

For information about how to deploy application, refer Application using ODF Storage section in ODF documentation.

  • Supported Modes:

    • RBD & CephFS: RWX (ReadWriteMany) is fully supported.
    • RWO (ReadWriteOnce) Failover: For applications using RWO storage to failover correctly during a node shutdown, you must apply specific taints to the node marked for shutdown(make sure the node does not automatically come back online).
  • Perform the following steps during maintenance or shutdown:

oc adm taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
oc adm taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoSchedule
  • Perform the following to untaint after the node is back:
oc adm taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute-
oc adm taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoSchedule-

NOTE: After node recovery, If ceph status shows osd pod down, try to restart the OSD pods to get them back.

Troubleshooting

When DRBD build is in the error state:

oc get pods -nopenshift-kmm
NAME                                       READY   STATUS    RESTARTS   AGE
drbd-kmod-build-hvg4l-build                0/1     Error     0          9m43s

Error: error: build error: Failed to push image: trying to reuse blob sha256:4638e3415987f378f2d6dd70f9c6a2869dd5ebd09e3510ba45e46bbb6ec1a3dd at destination: pinging container registry image-registry.openshift-image-registry.svc:5000: Get "https://image-registry.openshift-image-registry.svc:5000/v2/": tls: failed to verify certificate: x509: certificate signed by unknown authority

Workaround
Delete the pod and module.

$ oc delete module drbd-kmod -n openshift-kmm
module.kmm.sigs.x-k8s.io "drbd-kmod" deleted
$ oc delete pods -nopenshift-kmm drbd-kmod-build-hvg4l-build
pod "drbd-kmod-build-hvg4l-build" deleted

After cleanup, re-run the DRBD configuration step to trigger a fresh module build.

Reload the kernel module when the registry pod is restarted:

This situation occurs when:

  1. The active registry pod restarts, and the DRBD daemonset is also restarted
  2. The node hosting the active registry is fenced

Workaround
Patch the NodeModulesConfig CR for each node to force KMM (Kernel Module Management) to rebuild the DRBD module:

Run the following commands for both the nodes.

$ oc get nodemodulesconfig.kmm.sigs.x-k8s.io $ oc patch nodemodulesconfig <node-name> --type=json --subresource=status -p='[{"op": "remove", "path": "/status/modules/0"}]'

Floating monitor enters drbd-init failover state during node failover

During node failover events, the floating monitor may enter an unexpected drbd‑initfailover state if the incorrect node is marked as DRBD secondary.

Workaround
Manually set both nodes to DRBD Secondary to reset the DRBD role state:

oc debug node/"$Node" -- chroot /host sudo podman run --rm --privileged -v /dev:/dev -v /etc/drbd.conf:/etc/drbd.conf -v /etc/drbd.d:/etc/drbd.d --hostname "$Node" quay.io/rhceph-dev/odf4-drbd-rhel9:v4.21.0-1 drbdadm secondary r0 

Features not supported in TNF

  • Not Supported:
  • Multicloud Object Gateway (NooBaa)
  • Network File System (NFS)
  • RADOS Object Gateway (RGW)
  • Disaster Recovery (Regional DR and Metro-DR)
  • PDBs
  • Mon failover
  • Multus
  • External PostgreSQL
  • Nodes selection
  • Performance/resource profile
  • Automatic capacity scaling.
  • Host networking
  • EC pools
  • External clients (HCI/Provider)
Article Type