ODF: The ocs-metrics-exporter killed by OOM repeatedly.

Solution Verified - Updated

Environment

Red Hat OpenShift Container Platform (OCP) 4.18+
Red Hat OpenShift Data Foundation (ODF) 4.18+

Issue

The ocs-metrics-exporter killed by OOM repeatedly.

Examples:

OCP Cluster ID: Redacted
Ceph Cluster ID: Redacted
OCP Version: 4.18.15
ODF Version: 4.18.6

$ oc get pod ocs-metrics-exporter-6f945875f-mpgcz -o json | jq -c '.status.containerStatuses[] | {name: .name, restarts: .restartCount, exitCode: .lastState.terminated.exitCode, reason: .lastState.terminated.reason}'
{"name":"kube-rbac-proxy-main","restarts":109,"exitCode":137,"reason":"OOMKilled"}
{"name":"kube-rbac-proxy-self","restarts":23,"exitCode":137,"reason":"OOMKilled"}
{"name":"ocs-metrics-exporter","restarts":0,"exitCode":null,"reason":null}

OCP Cluster ID: Redacted
Ceph Cluster ID: Redacted
OCP Version: 4.18.15
ODF Version: 4.18.6

$ oc get pod ocs-metrics-exporter-6f945875f-6nc4l -o json | jq -c '.status.containerStatuses[] | {name: .name, restarts: .restartCount, exitCode: .lastState.terminated.exitCode, reason: .lastState.terminated.reason}'
{"name":"kube-rbac-proxy-main","restarts":2,"exitCode":137,"reason":"OOMKilled"}
{"name":"kube-rbac-proxy-self","restarts":7,"exitCode":137,"reason":"OOMKilled"}
{"name":"ocs-metrics-exporter","

Resolution

The issue is resolved in the releases list below and higher. Red Hat recommends an ODF upgrade to one of versions or higher to remove this issue from your OpenShift Environment. As part of the code remediation the CPU/Memory can now be changed, see the Diagnostic Steps section of this solution.

Product/VersionRelated BZ/JiraErrataFixed Version
ODF/4.20Jira This content is not included.DFBUGS-3286Errata RHSA-2025:217044.20.0
ODF/4.19Jira This content is not included.DFBUGS-4124Errata RHSA-2025:213784.19.7
ODF/4.18Jira This content is not included.DFBUGS-4125Errata RHSA-2025:213684.18.12

Root Cause

Code issue + inability to easily change resources.

Diagnostic Steps

The ocs-metrics-exporter is a Pod with 3 active Containers, (kube-rbac-proxy-main, kube-rbac-proxy-self, ocs-metrics-exporter).

With the remediated code, the CPU and Memory resources are set to these default values:

$ oc get deploy ocs-metrics-exporter -oyaml | grep -B11 -A3 resources:
        name: kube-rbac-proxy-main
        resources:
          limits:
            cpu: 50m
            memory: 40Mi
          requests:
            cpu: 50m
            memory: 40Mi
        securityContext:
--
        name: kube-rbac-proxy-self
        resources:
          limits:
            cpu: 50m
            memory: 40Mi
          requests:
            cpu: 50m
            memory: 40Mi
        securityContext:
--
        name: ocs-metrics-exporter
        resources:
          requests:
            cpu: 50m
            memory: 50Mi
        securityContext:

Also, the CPU and Memory resources can be changed if future OOM errors occur.
(The values shown are simply to represent a changes - Theses values are not suggested values)

$ oc patch -n openshift-storage storagecluster ocs-storagecluster --type merge --patch '{"spec": {"resources": {"ocs-metrics-exporter":  {"limits": {"cpu": "60","memory": "70Mi"},"requests": {"cpu": "60m","memory": "70Mi"}}}}}'

$ oc patch -n openshift-storage storagecluster ocs-storagecluster --type merge --patch '{"spec": {"resources": {"kube-rbac-proxy-self": {"limits": {"cpu": "70m","memory": "80Mi"},"requests": {"cpu":"70m","memory": "80Mi"}}}}}'

$ oc patch -n openshift-storage storagecluster ocs-storagecluster --type merge --patch '{"spec": {"resources": {"kube-rbac-proxy-main": {"limits": {"cpu": "80m","memory": "90Mi"},"requests": {"cpu":"80m","memory": "90Mi"}}}}}'

$ oc get storagecluster -o yaml
    resourceProfile: balanced
    resources:
      kube-rbac-proxy-main:
        limits:
          cpu: 80m
          memory: 90Mi
        requests:
          cpu: 80m
          memory: 90Mi
      kube-rbac-proxy-self:
        limits:
          cpu: 70m
          memory: 80Mi
        requests:
          cpu: 70m
          memory: 80Mi
      mds:
        limits:
          cpu: "1"
          memory: 8Gi
        requests:
          cpu: "1"
          memory: 8Gi
      ocs-metrics-exporter:
        limits:
          cpu: "60"
          memory: 70Mi
        requests:
          cpu: 60m
          memory: 70Mi
    storageDeviceSets:

$ oc get deploy ocs-metrics-exporter -o yaml
{Output not shown, but the CPU and Memory limits and resources will be the same}
SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.