Pod csi-addons-controller-manager OOMKilled - OpenShift Data Foundation (ODF)
Environment
- Red Hat OpenShift Data Foundations (RHODF) v4.10+
Issue
In some instances, usually high PVC counts, the csi-addons-controller-manager pod managed by the csi-addons-controller-manager CSV gets OOMKilled.
If by chance it's observed that the csi-cephfsplugin, csi-rbdplugin, csi-cephfsplugin-provisioner, csi-rbdplugin-provisioner, ocs-operator, ocs-metrics-exporter, odf-console, or odf-operator-controller-manager have a high number of restarts, please redirect attention to the Certain Pods Have High Restarts - OpenShift Data Foundation (ODF) solution.
Resolution
- Validate and note the name of the
odf-csi-addons-operatorsubscription.
$ oc get sub -n openshift-storage
- Edit the
odf-csi-addons-operatorsubscription with:
$ oc edit sub -n openshift-storage odf-csi-addons-operator-stable-<version>-redhat-operators-openshift-marketplace
- Increase the values to
800Miunder the config section:
Example:
$ oc edit sub -n openshift-storage odf-csi-addons-operator-stable-4.18-redhat-operators-openshift-marketplace <--- match name with output from step 1
config:
resources:
limits:
cpu: "1"
memory: 512Mi <---- Increase to 800Mi
requests:
cpu: 10m
memory: 64Mi <---- Increase to 800Mi
Root Cause
The default memory limit 512Mi could be too low for some installations.
See this This content is not included.bugzilla for further information.
Diagnostic Steps
- csi addon pod log shows
- restartCount: 55
started: false
ready: false
name: manager <-------------------- Container
state:
waiting:
reason: CrashLoopBackOff
message: >-
back-off 5m0s restarting failed container=manager
pod=csi-addons-controller-manager-54594877db-4wm8t_openshift-storage(c0952297-a7dc-4093-ad02-ce1fe3d45b9c)
imageID: >-
registry.redhat.io/odf4/odf-csi-addons-rhel8-operator@sha256:8a7dfdfd9e851b0b68481726e9fb2946075d09385782aca5c6e70babc8763234
image: >-
registry.redhat.io/odf4/odf-csi-addons-rhel8-operator@sha256:c13cd4dbe18b4888a9be2dc1c94709d33735b4c030119daf50879808a3ab31f0
lastState:
terminated:
exitCode: 137 <------------------------ Exit code for OOMKill
reason: OOMKilled <-------------------- OOMKill
startedAt: '2024-03-22T17:48:59Z'
finishedAt: '2024-03-22T17:49:26Z'
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.