How to Redeploy rook-ceph-mon Resources in OpenShift Data Foundation (ODF)

Solution Verified - Updated 18 Mar 2026

Environment

Red Hat OpenShift Container Platform (RHOCP) v4.x
Red Hat OpenShift Data Foundation (RHODF) v4.x

Issue

There may be instances, such as an OSD migration to a new datastore/storageclass with PVC-backed rook-ceph-mon resources. For OSD migrations to a new datastore/storageclass where the old storageclass/datastore will need to be removed, this is a mandatory step. Not doing so, and removing the old storageclass while the mons are still backed by that storageclass can cause data loss.

Additionally, there may be inconsistencies between the rook-ceph-mon placement (scheduled nodes) and the rook-ceph-mon-endpoints configmap, where a redeployment of the rook-ceph-mon resource is needed.

Accomplishing the redeployment is safe and easy as long as the warning is adhered to, as this process will only scale the rook-ceph-mon resource down for enough time for Rook to reconcile the discrepancy.

Resolution

NOTE: The rook-ceph-operator pod MUST be running for the steps below to be successful.

**WARNING: This solution should begin with all three rook-ceph-mon resources in Quorum. Perform a rook-ceph-mon migration ONE MON AT A TIME, ONLY MOVING TO THE NEXT MON WHEN THE PREVIOUS MON JOINS QUORUM):

NOTE: For newly migrated OSD datastore/storageclass with PVC-backed rook-ceph-mon pods users ONLY, ensure the NEW storageclass is now shown in the dataPVCTemplate of the StorageCluster CR and/or is now reflected in the CephCluster CR:

$ oc get storagecluster -n openshift-storage ocs-storagecluster -o yaml | grep -A10 dataPVCTemplate
$ oc get cephcluster -n openshift-storage ocs-storagecluster-cephcluster -o yaml | grep storageClassName

PROCEDURE:

Validate all three (or more) mons are in quorum:

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph status -c /var/lib/rook/openshift-storage/openshift-storage.config

    health: HEALTH_OK <------------------------ Ceph is Healthy
 
  services:
    mon: 3 daemons, quorum a,c,d (age 41s) <--- ALL THREE MONS ARE IN QUORUM, SAFE TO MIGRATE
    mgr: a(active, since 5s), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 3 osds: 3 up (since 44h), 3 in (since 44h)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 256 pgs
    objects: 578 objects, 1.8 GiB
    usage:   6.5 GiB used, 293 GiB / 300 GiB avail
    pgs:     256 active+clean

Backup the current rook-ceph-mon-endpoints configmap and any affected mon deployment(s):

$ oc get cm -n openshift-storage rook-ceph-mon-endpoints -o yaml > rook-ceph-mon-endpoints.yaml
$ oc get deployment -n openshift-storage rook-ceph-mon-<X> -o yaml > rook-ceph-mon-<X>.yaml

Scale the first affected rook-ceph-mon down:

$ oc scale deployment -n openshift-storage rook-ceph-mon-X --replicas=0

Wait ~10-15 minutes (700 seconds), and the rook-ceph-operator will deploy a new mon, likely using a new/different letter, and will remove the old deployment when conditions are met:

$ oc get deployment -n openshift-storage -l app=rook-ceph-mon
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
rook-ceph-mon-a   1/1     1            1           62m
rook-ceph-mon-c   0/0     0            0           44h
rook-ceph-mon-d   1/1     1            1           22h

$ oc get deployment -n openshift-storage -l app=rook-ceph-mon
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
rook-ceph-mon-a   1/1     1            1           67m
rook-ceph-mon-c   0/0     0            0           44h <--- old mon
rook-ceph-mon-d   1/1     1            1           22h
rook-ceph-mon-e   0/1     1            0           13s <--- new mon

$ oc get deployment -n openshift-storage -l app=rook-ceph-mon
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
rook-ceph-mon-a   1/1     1            1           67m
rook-ceph-mon-d   1/1     1            1           22h
rook-ceph-mon-e   1/1     1            1           30s <---- Complete

Verification steps before moving to the next rook-ceph-mon migration (if applicable):

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph status -c /var/lib/rook/openshift-storage/openshift-storage.config

    health: HEALTH_OK <------------------------ Ceph is Healthy
 
  services:
    mon: 3 daemons, quorum a,d,e (age 41s) <--- ALL THREE MONS ARE IN QUORUM, SAFE TO MIGRATE NEXT MON
    mgr: a(active, since 5s), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 3 osds: 3 up (since 44h), 3 in (since 44h)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 256 pgs
    objects: 578 objects, 1.8 GiB
    usage:   6.5 GiB used, 293 GiB / 300 GiB avail
    pgs:     256 active+clean

Repeat steps 3-5 until complete if additional rook-ceph-mon resources need to be redeployed.

NOTE: For newly migrated OSD datastore/storageclass with PVC-backed rook-ceph-mon pods users ONLY, ensure the NEW storageclass is now reflected on the rook-ceph-mon PVCs:

$ oc get pvc -n openshift-storage | grep rook-ceph-mon

Root Cause

Ensure PVC-backed rook-ceph-mon pods can still access storage and maintain quorum.

ODF versions bundled with Rook v1.15 or higher, a change was introduced in the ClusterController logic that strictly enforces nodeAffinity and podAntiAffinity, whereas previous versions were more permissive of placement violations. This can cause the rook-ceph-mon resources to enter a down state if not resolved prior to an upgrade.

Diagnostic Steps

Ensure the node column on the pods matches the data/mapping in the rook-ceph-mon-endpoints configmap:

NOTE: For PVC-backed monitors, mapping will reflect null.

$ oc get pods -n openshift-storage -l app=rook-ceph-mon -o wide

$ oc get cm -n openshift-storage rook-ceph-mon-endpoints -o yaml

Product(s)

Red Hat OpenShift Data Foundation

Components

Storage

Category

Troubleshoot

Tags

Ceph

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.