How to safely reboot an OCS/ODF 4 node

Solution Verified - Updated 5 Mar 2026

Environment

Red Hat OpenShift Data Foundation (RHODF)
- 4
Red Hat OpenShift Container Storage (RHOCS)
- 4

Issue

I need to do maintenance on a single RHODF/RHOCS node on RHOCP
How do I safely reboot a single OCS/ODF node
How to reboot multiple ODF nodes

Resolution

NOTE: DO NOT APPLY this solution to ALL OCS/ODF nodes!!! This is only for a single node reboot.

Cordon the first node:
```
$ oc adm cordon <node-name>
```
Scale down the deployments for the mon and OSD(s) that are on the node:
NOTE: Make note of the name of the deployments you scale down! We will need these later to make sure they are scaled up once the node comes back online
```
$ oc get pods -owide  -n openshift-storage |grep <node-name>|egrep 'osd|mon'  
$ oc get deployment  -n openshift-storage
$ oc scale deployment <rook-ceph-mon|rook-ceph-osd> --replicas=0  -n openshift-storage
```
If the node is housing anything related to noobaa, the drain could get stuck draining noobaa pod(s).

For ODF 4.18 or older: Noobaa utilizes a single PostgreSQL pod setup (often named noobaa-db-pg-0).

You may delete the noobaa pods so they get scheduled on a different node(s):

NOTE: You will want to delete the noobaa pods in the following order:
- noobaa-db
- nooba-core
- nooba-endpoint
- nooba-operator
```
$ oc get pods -o wide  -n openshift-storage|grep <node-name>
$ oc delete pod <pod> -n openshift-storage
```
For ODF 4.19 or higher: Noobaa utilizes a high-availability setup involving multiple pods, such as noobaa-db-pg-cluster-1 and noobaa-db-pg-cluster-2

First check if the node that you want to restart is hosting the noobaa DB primary instance:
```
$ oc get clusters.postgresql.cnpg.noobaa.io -n openshift-storage
NAME                   AGE     INSTANCES   READY   STATUS                     PRIMARY
noobaa-db-pg-cluster   2d      2           2       Cluster in healthy state   noobaa-db-pg-cluster-1 <----
```
It is not recommended to perform a restart on the primary instance, instead please switch-over and promote the secondary instance on a different node to be the primary. This can be done by using the CNPG kubectl plugin in the cnpg-controller image.
Deploy a debug pod of the cnpg-controller:
```
$ oc debug deployment/cnpg-controller-manager
```
From inside the pod, run the promote command:
```
$ kubectl cnpg promote noobaa-db-pg-cluster <TARGET primary pod> 
```
Wait for the switchover to complete. Inspect the cluster resource and check that the primary is set to the desired pod and that the cluster is in a healthy state. e.g:
```
$ oc get cluster                         
NAME                   AGE     INSTANCES   READY   STATUS                     PRIMARY
noobaa-db-pg-cluster   5h22m   2           2       Cluster in healthy state   noobaa-db-pg-cluster-1 
```

Drain the node of all the pods:

    $ If using RHOCP 4.6 use this command
    $ oc adm drain <NODENAME> --delete-local-data --grace-period=1 --ignore-daemonsets

    $ If using RHOCP >=4.7 use this command
    $ oc adm drain <NODENAME> --delete-emptydir-data --grace-period=1 --ignore-daemonsets

Reboot the node:

$ oc debug node/<node-name>
$ chroot /host
$ systemctl reboot

Wait for the node to become 'Ready':

$ watch 'oc get nodes|grep <node-name>'

Uncordon the node only when the node is in a Ready state:
```
$ oc adm uncordon <node-name>
```

If you scaled down ANY deployments above, you will need to make sure to scale them back up:

$ oc get deployment -n openshift-storage
$ oc scale deployment <deployment-name> --replicas=1  -n openshift-storage

Verify that the Ceph cluster is in status HEALTH_OK and monitor the cluster until the placement groups (pgs) are back in status active+clean (see diagnostics notes below).

To reboot more than one ODF node, please follow the below guidelines

Verify if the ODF is in a healthy state. You can verify the status from the ODF console, additionally, you can also check the ceph status using ceph cli

# ceph status

Reboot the first node by following the general node reboot guidelines given here
Once the first node becomes "READY", verify if all the ODF workload has been properly scheduled on it.

# oc get pods -n openshift-storage

Verify if the Ceph status is healthy again, this is important as we want the data to be in sync between the node reboots.

# ceph status

Once the ceph status is healthy again, reboot the second node (again following the guidelines).
Repeat the same steps further to reboot the rest of the nodes.

Diagnostic Steps

Ensure you're logged onto an RHOCP cluster:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.4     True        False         6m25s   Cluster version is 4.6.4

    sh-4.4$ ceph -s
    cluster:
        id:     {redacted}
        health: HEALTH_OK

    services:
        mon: 3 daemons, quorum a,b,d (age 4d)
        mgr: a(active, since 4d)
        mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
        osd: 3 osds: 3 up (since 4d), 3 in (since 12d)
        rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a)

    task status:

    data:
        pools:   10 pools, 80 pgs
        objects: 19.95k objects, 76 GiB
        usage:   229 GiB used, 2.8 TiB / 3.0 TiB avail
        pgs:     80 active+clean

    io:
        client:   853 B/s rd, 15 KiB/s wr, 1 op/s rd, 1 op/s wr

SBR

Ceph
OCS

Product(s)

Red Hat OpenShift Container Storage

Components

openshift-node

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.