Certain Pods Have High Restarts - OpenShift Data Foundation (ODF)

Solution Verified - Updated

Environment

Red Hat OpenShift Data Foundations (RHODF) v4.9+

Issue

If high pod restarts are observed on the following pods, this is usually an indication of the odf-operator-controller-manager not having enough memory resources (requests/limits):

NOTE: If the noobaa-operator pod is observed to have high pod restarts, this is a separate/known issue. This issue has been fixed in ODF v4.12.5+.

Resolution

NOTE: This process involves editing the odf-operator Cluster Service Version (CSV). This means that an ODF upgrade will replace the old CSV with a new one and you may need to reapply the workaround.

  1. Capture the current ODF Operator version:

$ oc get csv -n openshift-storage | grep odf-operator

  1. Edit the odf-operator Cluster Service Version (CSV) with the oc edit csv -n openshift-storage odf-operator.v<version>-rhodf command:

Example:

$ oc edit csv -n openshift-storage odf-operator.v4.14.5-rhodf

  1. Increase the memory requests and limits to 800Mi:
    resources:
      limits:
        cpu: 200m
        memory: 300Mi    <------ Increased to 800Mi
      requests:
         cpu: 200m
         memory: 200Mi   <------ Increased to 800Mi
  1. If high pod restarts are still observed, consider increasing the memory requests/limits to even higher values ~1Gi.

NOTE. Engineering has provided a fix that consist on increasing odf-operator-controller-manager pod mem limits to 400Mi due to a new recent issue in ODF 4.19 : This content is not included.odf-operator-controller-manager pod is OOMKilled after updating ODF to 4.19 . This fix/change will be available on:

ODF 4.21
ODF 4.20.6 or higher
ODF 4.19.11 or higher

Root Cause

For certain workloads, the default odf-operator or csi-addons-controller-manager memory requests/limits may not suffice. A manual edit to increase these resources to higher values should alleviate this issue.

Diagnostic Steps

NAME                                                              READY   STATUS    RESTARTS    AGE
csi-addons-controller-manager-56cb476d98-f4f42                    2/2     Running   41          2d2h
csi-cephfsplugin-nlqdc                                            2/2     Running   57          2d2h
csi-cephfsplugin-provisioner-697944789b-l5gnj                     5/5     Running   62          2d2h
csi-rbdplugin-9zwr9                                               3/3     Running   81          2d2h
csi-rbdplugin-provisioner-75d5744db7-phckn                        6/6     Running   44          2d2h
ocs-metrics-exporter-5f8bf47cc5-gh6j4                             1/1     Running   39          2d2h
odf-console-7d9bd98ddc-zr58z                                      1/1     Running   21          2d2h
odf-operator-controller-manager-59c6c74cb-n4hct                   2/2     Running   121         2d2h

odf-operator-controller-manager.yaml:

    lastState:
      terminated:
        containerID: <ommited>
        exitCode: 1
        finishedAt: "2023-10-06T18:44:42Z"
        reason: Error
        startedAt: "2023-10-06T18:36:13Z"
    name: manager
    ready: true
    restartCount: 121
    started: true

csi-addons-controller-manager.yaml:

  - containerID:  <ommited>
    imageID:  <ommited>
    lastState:
      terminated:
        containerID:  <ommited>
        exitCode: 137
        finishedAt: "2024-03-22T03:00:10Z"
        reason: OOMKilled
        startedAt: "2024-03-22T02:59:44Z"
    name: manager
    ready: true
    restartCount: 41
  • to check current values configured on odf-operator-controller-manager

    $ oc get deployments -n openshift-storage odf-operator-controller-manager -oyaml -C2
      resources:
        limits:
          cpu: 200m
          memory: 300Mi
        requests:
          cpu: 200m
          memory: 200Mi
      securityContext:
    
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.