Log collector vector has high memory consumption in RHOCP 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Red Hat OpenShift Logging (RHOL)
- 5.6
- 5.7
- Vector
Issue
- The Log collector consumes excesive memory, even when is used the version of Logging 5.7.6 or posterior for the bug indicated in Vector cpu and memory usage is increasing over the time in RHOCP 4
Resolution
Red Hat investigated this issue in bug report This content is not included.LOG-4536 and delivered a fix in RHOL 5.7.9 through errata RHBA-2023:7718.
If this issue still occurs in the environment after updating, open a support case in the Red Hat Customer Portal referring to this solution.
Workaround
Meanwhile the fix is not released, a good practice should set limits.memory and limits.cpu for the collector. The article How to define the collector in RHOCP 4 explains how to do it.
For more information, please open a This content is not included.new support case with Red Hat Support.
Root Cause
Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.
At the moment of reporting the bug This content is not included.LOG-4536, Vector is configured with a Content from vector.dev is not included.buffer.max_events of 500, a Content from vector.dev is not included.request.retry_attempts of infinite number of retries and a Content from vector.dev is not included.retry-policy that indicates that failed requests (status == 429, >=500, and !=501) will retry forever.
Then, in case that no able to deliver the logs to the destinations matching some of the status where the log is retry, this will cause an increment of the memory usage observed in Vector.
Diagnostic Steps
Verify the memory used by the collector pods:
$ oc adm top pods -l component=collector -n openshift-logging
collector-hnrvr 73m 169Mi
collector-pj8dh 203m 344Mi
collector-4rflv 119m 573Mi
collector-x5vhh 253m 1327Mi
collector-r8q4c 521m 3054Mi
collector-cnjn4 273m 4639Mi
collector-czx7r 431m 6129Mi
collector-f76z5 453m 39926Mi
collector-wd8hf 886m 41738Mi
collector-t682g 716m 44501Mi
collector-5c44p 475m 49746Mi
collector-ffzdw 298m 100878Mi
collector-5m22l 216m 142520Mi
In case that limits.memory set for the collector pods, then, it could be observed the collector pods restarting with OOMKill. Verify that the real problem is the shared in this article and it's not only related to the need of assigning more memory to the collector pod or any other issue.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.