pods in openshift-vsphere-infra namespace so much verbose in RHOCP 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4.14
    • 4.15
    • 4.16
  • VSphere IPI installation
  • Red Hat OpenShift Logging (RHOL)

Issue

  • The log storage is getting full caused by the keepalived-monitor and the haproxy pods in the openshift-vsphere-infra namespace:

        $ oc logs keepalived-master-example-0 -c  keepalived-monitor -n openshift-vsphere-infra`2024-02-15T08:20:21 
    2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
    2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"
    2024-02-15T08:20:21.733399279Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
    2024-02-15T08:20:21.733421398Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"
    $ oc logs haproxy-master-0-example -c haproxy-monitor -n openshift-vsphere-infra
    ...
    2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="Searching for Node IP of master-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x]'."
    2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="For node master-example-0 selected peer address x.x.x.x using NodeInternalIP"
    
  • The keepalived-monitor and the haproxy pods in the openshift-vsphere-infra namespace are printing as INFO logs that should be DEBUG

Resolution

This issue has been reported to Red Hat engineering. It is being tracked in different bugs:

For "logs of runtimecfg node-ip detection too verbose":

For "Logs of haproxy too verbose":

Workaround

Workaround 1. Collecting the logs, but reducing the log retention for this namespace

If using Elasticsearch as Log Storage, it could be configured log retention per namespace as described in the article "Configure log retention per namespace for OpenShift Elasticsearch log store" for reducing the log retention time for the namespace openshift-vsphere-infra.

If using Loki as Log Storage, it could configured as detailed in the documentation section "Enabling stream-based retention with Loki".

Workaround 2. Not collecting the logs

This is detailed in the Documentation section "Filtering logs by content".

An example is shared below:

apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: <name>
  namespace: <namespace>
spec:
  filters:
  - drop:
    - test:
      - field: .kubernetes.namespace_name
        matches: openshift-vsphere-infra
    name: drop-vsphere-infra-logs
    type: drop
  pipelines:
  - filterRefs:
    - drop-vsphere-infra-logs
    inputRefs:
    - infrastructure
    - application
    name: exclude-vsphere-infra-logs
    outputRefs:
    - default

For more information, please open a This content is not included.new support case with Red Hat Support.

Root Cause

The keepalived-monitor and the haproxy pods in the openshift-vsphere-infra namespace are printing as INFO logs that should be DEBUG.

Diagnostic Steps

  • If using Elasticsearch as log storage with Kibana as visualizer, then, go to Kibana > Discover > Available Fields > kubernetes.pod_name click on it and press Visualize for observing the pods producing more logs. Verify the index pattern is infra-*

  • Verify the haproxy pods are printing 2 logs each 6 seconds for each master, this means 6 messages in the same second, 60 messages/minute per pod

    $ oc logs haproxy-master-0-example -c haproxy-monitor -n openshift-vsphere-infra
    ...
    2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="Searching for Node IP of master-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x]'."
    2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="For node master-example-0 selected peer address x.x.x.x using NodeInternalIP"
    
  • Verify the keep-alive pods are printing 4 messages per node each 10 seconds:

    $ oc logs keepalived-master-example-0 -c  keepalived-monitor |grep worker-example-0|grep 2024-02-15T08:20:21 
    2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
    2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"
    2024-02-15T08:20:21.733399279Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
    2024-02-15T08:20:21.733421398Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"
    
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.