ODF: The ocs-metrics-exporter killed by OOM repeatedly.
Environment
Red Hat OpenShift Container Platform (OCP) 4.18+
Red Hat OpenShift Data Foundation (ODF) 4.18+
Issue
The ocs-metrics-exporter killed by OOM repeatedly.
Examples:
OCP Cluster ID: Redacted
Ceph Cluster ID: Redacted
OCP Version: 4.18.15
ODF Version: 4.18.6
$ oc get pod ocs-metrics-exporter-6f945875f-mpgcz -o json | jq -c '.status.containerStatuses[] | {name: .name, restarts: .restartCount, exitCode: .lastState.terminated.exitCode, reason: .lastState.terminated.reason}'
{"name":"kube-rbac-proxy-main","restarts":109,"exitCode":137,"reason":"OOMKilled"}
{"name":"kube-rbac-proxy-self","restarts":23,"exitCode":137,"reason":"OOMKilled"}
{"name":"ocs-metrics-exporter","restarts":0,"exitCode":null,"reason":null}
OCP Cluster ID: Redacted
Ceph Cluster ID: Redacted
OCP Version: 4.18.15
ODF Version: 4.18.6
$ oc get pod ocs-metrics-exporter-6f945875f-6nc4l -o json | jq -c '.status.containerStatuses[] | {name: .name, restarts: .restartCount, exitCode: .lastState.terminated.exitCode, reason: .lastState.terminated.reason}'
{"name":"kube-rbac-proxy-main","restarts":2,"exitCode":137,"reason":"OOMKilled"}
{"name":"kube-rbac-proxy-self","restarts":7,"exitCode":137,"reason":"OOMKilled"}
{"name":"ocs-metrics-exporter","
Resolution
The issue is resolved in the releases list below and higher. Red Hat recommends an ODF upgrade to one of versions or higher to remove this issue from your OpenShift Environment. As part of the code remediation the CPU/Memory can now be changed, see the Diagnostic Steps section of this solution.
| Product/Version | Related BZ/Jira | Errata | Fixed Version |
|---|---|---|---|
| ODF/4.20 | Jira This content is not included.DFBUGS-3286 | Errata RHSA-2025:21704 | 4.20.0 |
| ODF/4.19 | Jira This content is not included.DFBUGS-4124 | Errata RHSA-2025:21378 | 4.19.7 |
| ODF/4.18 | Jira This content is not included.DFBUGS-4125 | Errata RHSA-2025:21368 | 4.18.12 |
Root Cause
Code issue + inability to easily change resources.
Diagnostic Steps
The ocs-metrics-exporter is a Pod with 3 active Containers, (kube-rbac-proxy-main, kube-rbac-proxy-self, ocs-metrics-exporter).
With the remediated code, the CPU and Memory resources are set to these default values:
$ oc get deploy ocs-metrics-exporter -oyaml | grep -B11 -A3 resources:
name: kube-rbac-proxy-main
resources:
limits:
cpu: 50m
memory: 40Mi
requests:
cpu: 50m
memory: 40Mi
securityContext:
--
name: kube-rbac-proxy-self
resources:
limits:
cpu: 50m
memory: 40Mi
requests:
cpu: 50m
memory: 40Mi
securityContext:
--
name: ocs-metrics-exporter
resources:
requests:
cpu: 50m
memory: 50Mi
securityContext:
Also, the CPU and Memory resources can be changed if future OOM errors occur.
(The values shown are simply to represent a changes - Theses values are not suggested values)
$ oc patch -n openshift-storage storagecluster ocs-storagecluster --type merge --patch '{"spec": {"resources": {"ocs-metrics-exporter": {"limits": {"cpu": "60","memory": "70Mi"},"requests": {"cpu": "60m","memory": "70Mi"}}}}}'
$ oc patch -n openshift-storage storagecluster ocs-storagecluster --type merge --patch '{"spec": {"resources": {"kube-rbac-proxy-self": {"limits": {"cpu": "70m","memory": "80Mi"},"requests": {"cpu":"70m","memory": "80Mi"}}}}}'
$ oc patch -n openshift-storage storagecluster ocs-storagecluster --type merge --patch '{"spec": {"resources": {"kube-rbac-proxy-main": {"limits": {"cpu": "80m","memory": "90Mi"},"requests": {"cpu":"80m","memory": "90Mi"}}}}}'
$ oc get storagecluster -o yaml
resourceProfile: balanced
resources:
kube-rbac-proxy-main:
limits:
cpu: 80m
memory: 90Mi
requests:
cpu: 80m
memory: 90Mi
kube-rbac-proxy-self:
limits:
cpu: 70m
memory: 80Mi
requests:
cpu: 70m
memory: 80Mi
mds:
limits:
cpu: "1"
memory: 8Gi
requests:
cpu: "1"
memory: 8Gi
ocs-metrics-exporter:
limits:
cpu: "60"
memory: 70Mi
requests:
cpu: 60m
memory: 70Mi
storageDeviceSets:
$ oc get deploy ocs-metrics-exporter -o yaml
{Output not shown, but the CPU and Memory limits and resources will be the same}
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.