Certain Pods Have High Restarts - OpenShift Data Foundation (ODF)
Environment
Red Hat OpenShift Data Foundations (RHODF) v4.9+
Issue
If high pod restarts are observed on the following pods, this is usually an indication of the odf-operator-controller-manager not having enough memory resources (requests/limits):
-
ceph-csi-controller-manager <----- Separate Article: ceph-csi-controller-manager Pod has High Restarts
-
csi-addons-controller-manager <-- Separate Article: Pod csi-addons-controller-manager OOMKilled
-
csi-cephfsplugin
-
csi-cephfsplugin-provisioner
-
csi-rbdplugin
-
csi-rbdplugin-provisioner
-
ocs-operator
-
ocs-metrics-exporter
-
odf-console
-
odf-operator-controller-manager
The ocs-metrics-exporter uses the odf-operator-controller-manager resources in ODF v4.18 and below; however, in ODF v4.19+ the ocs-metrics-exporter and it's respective containers must be patched in the storagecluster. See the ODF: The ocs-metrics-exporter killed by OOM repeatedly. solution for more informaiton.
NOTE: If the noobaa-operator pod is observed to have high pod restarts, this is a separate/known issue. This issue has been fixed in ODF v4.12.5+.
Resolution
NOTE: This process involves editing the odf-operator Cluster Service Version (CSV). This means that an ODF upgrade will replace the old CSV with a new one and you may need to reapply the workaround.
- Capture the current ODF Operator version:
$ oc get csv -n openshift-storage | grep odf-operator
- Edit the odf-operator Cluster Service Version (CSV) with the
oc edit csv -n openshift-storage odf-operator.v<version>-rhodfcommand:
Example:
$ oc edit csv -n openshift-storage odf-operator.v4.14.5-rhodf
- Increase the memory requests and limits to
800Mi:
resources:
limits:
cpu: 200m
memory: 300Mi <------ Increased to 800Mi
requests:
cpu: 200m
memory: 200Mi <------ Increased to 800Mi
- If high pod restarts are still observed, consider increasing the memory requests/limits to even higher values
~1Gi.
NOTE. Engineering has provided a fix that consist on increasing odf-operator-controller-manager pod mem limits to 400Mi due to a new recent issue in ODF 4.19 : This content is not included.odf-operator-controller-manager pod is OOMKilled after updating ODF to 4.19 . This fix/change will be available on:
ODF 4.21
ODF 4.20.6 or higher
ODF 4.19.11 or higher
Root Cause
For certain workloads, the default odf-operator or csi-addons-controller-manager memory requests/limits may not suffice. A manual edit to increase these resources to higher values should alleviate this issue.
Diagnostic Steps
NAME READY STATUS RESTARTS AGE
csi-addons-controller-manager-56cb476d98-f4f42 2/2 Running 41 2d2h
csi-cephfsplugin-nlqdc 2/2 Running 57 2d2h
csi-cephfsplugin-provisioner-697944789b-l5gnj 5/5 Running 62 2d2h
csi-rbdplugin-9zwr9 3/3 Running 81 2d2h
csi-rbdplugin-provisioner-75d5744db7-phckn 6/6 Running 44 2d2h
ocs-metrics-exporter-5f8bf47cc5-gh6j4 1/1 Running 39 2d2h
odf-console-7d9bd98ddc-zr58z 1/1 Running 21 2d2h
odf-operator-controller-manager-59c6c74cb-n4hct 2/2 Running 121 2d2h
odf-operator-controller-manager.yaml:
lastState:
terminated:
containerID: <ommited>
exitCode: 1
finishedAt: "2023-10-06T18:44:42Z"
reason: Error
startedAt: "2023-10-06T18:36:13Z"
name: manager
ready: true
restartCount: 121
started: true
csi-addons-controller-manager.yaml:
- containerID: <ommited>
imageID: <ommited>
lastState:
terminated:
containerID: <ommited>
exitCode: 137
finishedAt: "2024-03-22T03:00:10Z"
reason: OOMKilled
startedAt: "2024-03-22T02:59:44Z"
name: manager
ready: true
restartCount: 41
-
to check current values configured on
odf-operator-controller-manager$ oc get deployments -n openshift-storage odf-operator-controller-manager -oyaml -C2 resources: limits: cpu: 200m memory: 300Mi requests: cpu: 200m memory: 200Mi securityContext:
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.