Migrating the log collector from Fluentd to Vector reducing the number of duplicated logs in RHOCP 4

Updated

Table of Contents

Introduction
Prerequisites
How many logs will be duplicated after the migration?
Current Stack
Deploy a syslog server running in the same OpenShift cluster that the collector pods
The multi log forwarder configuration
Additional configurations
BONUS: delete resources

Introduction

The following document describes how to migrate the Red Hat OpenShift Logging collector from fluentd to vector in a simplified version. This guide does not claim to show all the possible scenarios, and it's recommended to test the procedures described here in a lab before applying them in a production environment.

After applying the following steps:

  • The collector type used will be vector
  • Some duplicated logs could exist, but the number should be reduced

Tested with:

  • Red Hat OpenShift 4.16.38
  • Red Hat OpenShift Logging v5.8.19

Note: for a simpler procedure where duplicated logs will exist, follow the article To migrate default log store from Elasticsearch to Loki in Red Hat OpenShift Logging

The migration of the collector type from fluentd to vector implies that the Vector collector will start to read all the logs available from the beginning in the nodes as the fluentd position files, that contain the last position read from the log files, are not possible to migrate from the Fluentd to the Vector collectors. This will imply:

  • Logs duplicated in the log storage
  • Peak of cpu and memory on the vector pods started as they will try to read immediately all the logs available in the system: old logs and new logs arriving.
  • Impact on the storage used by the Log Store
  • 429 Too Many Requests if using Loki as consequence of the Rate Limit and Stream Rate Limits. These logs won't be missed as they will be retried

For more details, read This content is not included."Vector and fluentd comparative for help on transition/adoption".

The multi log forwarder feature will be used to run in parallel new collector pods using vector as collector type for getting updated the vector checkpoint files before starting to log forward the logs to the real log storage.

How many logs will be duplicated after the migration?

Logs duplicated = (last log line read by Fluentd) - (last log line read by Vector visible in the rsyslog server)

Prerequisites

  1. Installed Red Hat OpenShift Logging Operator, at least in one of the v5.8
  2. Ensure sufficient resources on the target nodes for running in parallel the fluentd and vector pods
  3. Ensure that the definition of the spec.collection follow the new API definition or some commands could need to be modified for adapting the paths.

Current Stack

The current clusterLogging configuration used for this example looks like below representing a fully managed OpenShift Loggin stack where only collected the application and infrastructure logs (default stack if not clusterLogForwarder CR is set). Read section "Additional configurations" for when audit logs or additional inputs are defined.

Disclaimer: the stack might vary regarding resources/nodes/tolerations/selectors/collector type/backend storage used

$ oc get clusterlogging instance -o yaml 
apiVersion: logging.openshift.io/v1
kind: ClusterLogging

--- OUTPUT OMMITED ---
spec:
  collection:
    resources:
      limits:
        cpu: 500m
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 1Gi
    type: fluentd
--- OUTPUT OMMITED --- 

Deploy a syslog server running in the same OpenShift cluster that the collector pods

NOTE: The syslog server will write the logs to the file /var/log/messages inside the syslog pod. This path is ephemeral storage, this means, storage from the node, then, it could be considered:

  • to create a PVC and mount it in the rsyslog pod for not getting full the node filesystem
  • or in the rsyslog server pod, implement rotation of the /var/log/messages file
1. Create additional configuration for rsyslog
$ cat > rsyslog-add.conf << EOF
\$ModLoad imtcp
\$ModLoad imudp

\$InputTCPServerRun 6514
\$UDPServerRun 6514

# Increase the amount of open files rsyslog is allowed, which includes open tcp sockets
# This is important if there are many clients.
# http://www.rsyslog.com/doc/rsconf1_maxopenfiles.html
\$MaxOpenFiles 2048
*.*                                                  /var/log/messages
EOF
2. Create all the OpenShift objects
$ oc new-project rsyslog-pj
$ oc new-app --name rsyslog-server --docker-image=registry.access.redhat.com/rhel7/rsyslog
$ oc create sa rsyslog-server-sa
$ oc adm policy add-scc-to-user anyuid -z rsyslog-server-sa
$ oc set serviceaccount deployment/rsyslog-server rsyslog-server-sa
$ oc create cm rsyslog-cm --from-file=rsyslog-add.conf=rsyslog-add.conf
$ oc set volume deployment/rsyslog-server --add -m /etc/rsyslog.d/ --configmap-name=rsyslog-cm
$ oc expose deployment rsyslog-server --port=6514

The multi log forwarder configuration

Grant permissions to read the logs

NOTE: for all the details, read Log Forwarding.

Step 1. Create the serviceAccount to be used in the Log Forwarder:
$ oc create sa fluentd2vector -n openshift-logging

Starting in Red Hat Logging v5.8, three new clusterRoles exists, one for each type of log to be desired to collect:

  • collect-application-logs: allows to collect application logs
  • collect-infrastructure-logs: allows to collect infrastructure logs
  • collect-audit-logs: allows to collect audit logs

In this example, it will be created the clusterRoleBinding for giving permissions to collect the application and insfructure logs. In case that also collecting the audit logs, it will be needed to create the clusterRoleBinding to the clusterRole: collect-audit-logs

$ oc create clusterrolebinding collect-app-logs-fluentd2vector --clusterrole=collect-application-logs --serviceaccount openshift-logging:fluentd2vector
$ oc create clusterrolebinding collect-infra-logs-fluentd2vector --clusterrole=collect-infrastructure-logs --serviceaccount openshift-logging:fluentd2vector
Step 2. Verify that the clusterRolebinding created:
$ oc get clusterrolebinding collect-app-logs-fluentd2vector 
NAME               ROLE                                   AGE
collect-app-logs   ClusterRole/collect-application-logs   6m1s 

$ oc get clusterrolebinding collect-infra-logs-fluentd2vector 
NAME                          ROLE                                      AGE
collect-infrastructure-logs   ClusterRole/collect-infrastructure-logs   4m59s
Create the clusterLogging CR

Create a clusterLogging CR for setting the resources: limits and requests and in case that needed, the nodeSelector and/or taints. In this example, it will be called the ClusterLogging CR: migration.

NOTE: It's important setting the limits.{cpu,memory}. When starting o during the time of the migration, some cpu and/or memory alarms for the pods called migration-* could be triggered, and it could be desired to be This page is not included, but the link has been rewritten to point to the nearest parent document.silenced

$ cat << EOF |oc create -f -
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "migration"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  collection:
    type: "vector"
    resources:
      limits:
        cpu: 500m
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 1Gi
EOF
Create the clusterLogForwarder CR
Step 1. Create a clusterLogForwarder CR for collecting the same type logs that used by fluentd. The clusterLogForwarder must have the same name that the clusterLogging CR. In this example is migration

Read section "Other configurations" for when audit logs or additional inputs are defined

$ cat << EOF |oc create -f -
apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
  name: migration 
  namespace: openshift-logging 
spec:
  serviceAccountName: fluentd2vector
  outputs:
  - name: logs
    type: syslog
    url: tcp://rsyslog-server.rsyslog-pj.svc:6514
  pipelines: 
   - name: all
     inputRefs:
     - application
     - infrastructure
     outputRefs:
     - logs
EOF
Step 2. Verify new collectors are running
$ oc get pods -l app.kubernetes.io/instance=migration -n openshift-logging
NAME              READY   STATUS    RESTARTS   AGE
migration-54q7t   1/1     Running   0          6m33s
migration-7ph2x   1/1     Running   0          6m33s
migration-g95xx   1/1     Running   0          6m33s
migration-gvwvj   1/1     Running   0          6m33s
migration-lbhj4   1/1     Running   0          6m33s
migration-tp8zf   1/1     Running   0          6m33s
Verify that the logs are arriving to the syslog server
$ oc -n rsyslog-pj exec $(oc get pods -o name -n rsyslog-pj) -- tail -f /var/log/messages

Migration for log forwarding using vector collectors

1. Verify Vector is reading the current logs produced by applications.

Vector will need to read all the logs on the nodes starting from 0. It can take a while to be aligned Vector with the last logs produced. This can be verified checking the last log message in the rsyslog server:

$ oc -n rsyslog-pj exec $(oc get pods -o name -n rsyslog-pj) -- tail -1 /var/log/messages 
2. Stop the Vector pods
$ oc -n openshift-logging patch clusterlogging/migration --type=merge -p '{"spec":{"collection":{"nodeSelector":{"collectors": "stop"}}}}'
3. Verify the last log entry in the syslog server
$ oc -n rsyslog-pj exec $(oc get pods -o name -n rsyslog-pj) -- tail -1 /var/log/messages
4. Verify in the Log Store that the last log entry in the syslog is also in the Log Store

If the Log Store is Elasticsearch, check going to Kibana, if the Log Store is Loki, check going to the "OpenShift Console > Observe > Logs".

5. Stop the Fluentd pods

Once it's confirmed that the last log entry read by Vector was also read by Fluentd and in the Log Store, stop the Fluentd pods:

$ oc -n openshift-logging patch clusterlogging/instance --type=merge -p '{"spec":{"collection":{"nodeSelector":{"collectors": "stop"}}}}'
6. Copy the vector checkpoints to the clusterLogging instance path and verify them:
$ for node in $(oc get nodes -o name); do echo "### $node ###"; oc debug $node -- chroot /host /bin/bash -c "cp -Ra /var/lib/vector/openshift-logging/migration/* /var/lib/vector/" ; done

$ for node in $(oc get nodes -o name); do echo "### $node ###"; oc debug $node -- chroot /host /bin/bash -c "ls -ld /var/lib/vector/*"; done
7. Change collector type from fluentd to vector

Change from Fluentd to Vector the colletor type in the clusterlogging instance:

$ oc -n openshift-logging patch clusterlogging/instance --type=merge -p '{"spec":{"collection":{"type":"vector"}}}'
8. Remove the nodeSelector in the clusterlogging/instance for starting the vector pods:
$ oc -n openshift-logging patch clusterlogging/instance --type='json' -p='[{"op": "remove", "path": '/spec/collection/nodeSelector'}]'
9. Verify the collector pods are running and using Vector as collector type:
$ pod=$(oc -n openshift-logging get pods -l component=collector -o jsonpath='{.items[0].metadata.name}')
$ oc -n openshift-logging get pod $pod -o yaml  |grep  image: 
    image: registry.redhat.io/openshift-logging/vector-rhel9@sha256:b6691ffe1e58a570cbe6f5e12dec55305880935e14e84de1a361931b09b36047
    image: registry.redhat.io/openshift-logging/vector-rhel9@sha256:b6691ffe1e58a570cbe6f5e12dec55305880935e14e84de1a361931b09b36047

Additional configurations

This is quick example for when the clusterLogForwarder CR instance exists and it's collecting:

  • audit logs
  • an input collecting only logs from the namespace app-dev

Original clusterLogForwarder CR instance example:

$ oc get clusterlogforwarder instance -o yaml -n openshift-logging
[...]
spec:
  inputs:
  - application:
      namespaces:
      - app-dev
    name: app
  outputs:
  - http:
      method: POST
    name: httpreceiver
    type: http
    url: http://httpreceiver.svc.cluster.local:9880/kubernetes
  pipelines:
  - inputRefs:
    - application
    - infrastructure
    - audit
    name: all-logs
    outputRefs:
    - default
  - inputRefs:
    - audit
    name: external-app
    outputRefs:
    - httpreceiver

--- OUTPUT OMITTED ---
Step 1. Grant permissions to read the audit logs
$ oc create clusterrolebinding collect-audit-logs-fluentd2vector --clusterrole=collect-audit-logs --serviceaccount openshift-logging:fluentd2vector
Step 2. Create the clusterLogForwarder CR migration

The inputs and inputRefs from the clusterLogForwarder CR instance must exist and use them to log forward to the syslog server:

$ cat << EOF |oc create -f -
apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
  name: migration 
  namespace: openshift-logging 
spec:
  serviceAccountName: fluentd2vector
  inputs:
  - application:
      namespaces:
      - app-dev
    name: app
  outputs:
  - name: logs
    type: syslog
    url: tcp://rsyslog-server.rsyslog-pj.svc:6514
  pipelines: 
   - name: all
     inputRefs:
     - application
     - infrastructure
     - audit
     - app
     outputRefs:
     - logs
EOF

The rest of the steps in this guide are the same.

BONUS: delete resources

Delete the resources created for the migration not longer needed:

$ oc -n openshift-logging delete clusterLogFowarder migration
$ oc -n openshift-logging delete clusterLogging migration
$ oc delete clusterrolebinding collect-app-logs-fluentd2vector
$ oc delete clusterrolebinding collect-infra-logs-fluentd2vector
$ oc delete clusterrolebinding collect-audit-logs-fluentd2vector
$ oc -n openshift-logging delete serviceaccount fluentd2vector
$ oc delete project fluentd2vector
$ oc delete project rsyslog-pj
Category
Components
Article Type