Logging Collector Pods being restarted with OOMKill in RHOCP 4

Solution Verified - Updated 25 Feb 2026

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 4
Red Hat OpenShift Logging (RHOL)
- 6
Vector
Collector pods

Issue

Logging collector pods are experiencing CrashLoopBackOff issue.
Multiple logging collector pods are in CrashLoopBackOff state.
The number of pods in CrashLoopBackOff state is increasing
It was just configured the Red Hat Logging stack and the collector pods are in CrashLoopBackOff

Resolution

Delays or failing to the deliver the logs to the destination

Review if any errors are in the collector pods:

Go to the OpenShift Console > Observe > Dashboards > Dashboard: Logging / Collection and review the Total errors last 60m Dashboard if errors are present that could justify the back pressure

Review in the collector logs itself for errors that could indicate any issue delivering the logs to the destinations

$ for pod in $(oc get pods -l app.kubernetes.io/component=collector -o name -n <namespace> ); do oc logs $pod -n <namespace> |grep -i error; done

If errors exist that indicate failures in delivering the logs, this should be fixed first.

As part of the normal activity depending on the number of logs needed to process and send and filters applied

Depending on the number of logs produced and the filters applied, the memory and cpu usage can be different from one collector to other. If the memory limit is hit leading to the pod to CrashLoopBackOff caused by OOMKill, at least that it's reduced the number of logs read and filtered, the solution is to increase the limits.memory.

For increasing the limits.memory following the Red Hat Documentation Section "Configure log collector CPU and memory limits".

Note: when the collector pods are configured for the first time in the system, they will need to read all the logs available in the system from days ago. This could generate extra pressure in the memory and cpu that could be reduced when all these logs from the past are read and it becomes only needed to read the current logs produced. After this peak in memory and cpu usage, it could be considered to reduce the limits.cpu and limits.memory verifying the normal cpu and memory used as indicated in the Red Hat Knowledge Article "Troubleshooting the Vector collector in RHOCP 4"

Root Cause

The collectors memory usage can be caused by:

slowness delivering the logs causing memory back pressure to the collector
an interruption in delivering the logs causing back pressure to the collector temporarily increasing the memory usage and cpu
as part of the normal activity depending on the number of logs needed to process and sent and filters applied. If the collectors are started for the first time, as they will start to read all the logs available in the system

If the OpenShift Admin has not set limits and/or requests for the collector, starting in RHOL v6, it has the following default requests and limits:

      "resources": {
        "limits": {
          "cpu": "6",
          "memory": "2Gi"
        },
        "requests": {
          "cpu": "500m",
          "memory": "64Mi"

Diagnostic Steps

Set the environment variables

$ cr="logging-collector"
$ ns="openshift-logging"

Verify that some collector pods are in CrashLoopBackOff or the number of RESTARTS is not equal to 0

$  oc get pods -l app.kubernetes.io/instance=$cr -n $ns
NAME                      READY   STATUS             RESTARTS   AGE
logging-collector-5k2v6   0/1     OOMKilled          4          3m
logging-collector-5w845   1/1     Running            0          3m
logging-collector-7ndr2   0/1     CrashLoopBackOff   4          3m
logging-collector-8tnkc   0/1     CrashLoopBackOff   4          3m
logging-collector-9frsc   1/1     Running            0          3m
logging-collector-gjxdw   0/1     CrashLoopBackOff   4          3m
logging-collector-j6sq5   1/1     Running            0          3m
logging-collector-pl5sw   1/1     Running            0          3m
logging-collector-prdcj   1/1     Running            0          3m
logging-collector-sxbtr   0/1     OOMKilled          4          3m
logging-collector-xpv98   1/1     Running            0          3m
logging-collector-xr5k2   0/1     OOMKilled          4          3m
logging-collector-z6n6s   1/1     Running            0          3m

Verify that the reason of being in CrashLoopBackOff is that they were OOMKill

$ oc get pods -l app.kubernetes.io/instance=$cr -n $ns -o yaml  |grep OOMKill
      reason: OOMKilled
      reason: OOMKilled
      reason: OOMKilled
      reason: OOMKilled
      reason: OOMKilled
      reason: OOMKilled
      reason: OOMKilled
      reason: OOMKilled

SBR

Shift Monitoring

Product(s)

Components

vector

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.