Migrating the log collector from Fluentd to Vector reducing the number of duplicated logs in RHOCP 4
Table of Contents
Introduction
Prerequisites
How many logs will be duplicated after the migration?
Current Stack
Deploy a syslog server running in the same OpenShift cluster that the collector pods
The multi log forwarder configuration
Additional configurations
BONUS: delete resources
Introduction
The following document describes how to migrate the Red Hat OpenShift Logging collector from fluentd to vector in a simplified version. This guide does not claim to show all the possible scenarios, and it's recommended to test the procedures described here in a lab before applying them in a production environment.
After applying the following steps:
- The collector type used will be vector
- Some duplicated logs could exist, but the number should be reduced
Tested with:
- Red Hat OpenShift 4.16.38
- Red Hat OpenShift Logging v5.8.19
Note: for a simpler procedure where duplicated logs will exist, follow the article To migrate default log store from Elasticsearch to Loki in Red Hat OpenShift Logging
The migration of the collector type from fluentd to vector implies that the Vector collector will start to read all the logs available from the beginning in the nodes as the fluentd position files, that contain the last position read from the log files, are not possible to migrate from the Fluentd to the Vector collectors. This will imply:
- Logs duplicated in the log storage
- Peak of cpu and memory on the vector pods started as they will try to read immediately all the logs available in the system: old logs and new logs arriving.
- Impact on the storage used by the Log Store
- 429 Too Many Requests if using Loki as consequence of the Rate Limit and Stream Rate Limits. These logs won't be missed as they will be retried
For more details, read This content is not included."Vector and fluentd comparative for help on transition/adoption".
The multi log forwarder feature will be used to run in parallel new collector pods using vector as collector type for getting updated the vector checkpoint files before starting to log forward the logs to the real log storage.
How many logs will be duplicated after the migration?
Logs duplicated = (last log line read by Fluentd) - (last log line read by Vector visible in the rsyslog server)
Prerequisites
- Installed Red Hat OpenShift Logging Operator, at least in one of the v5.8
- Ensure sufficient resources on the target nodes for running in parallel the fluentd and vector pods
- Ensure that the definition of the
spec.collectionfollow the new API definition or some commands could need to be modified for adapting the paths.
Current Stack
The current clusterLogging configuration used for this example looks like below representing a fully managed OpenShift Loggin stack where only collected the application and infrastructure logs (default stack if not clusterLogForwarder CR is set). Read section "Additional configurations" for when audit logs or additional inputs are defined.
Disclaimer: the stack might vary regarding resources/nodes/tolerations/selectors/collector type/backend storage used
$ oc get clusterlogging instance -o yaml
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
--- OUTPUT OMMITED ---
spec:
collection:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
type: fluentd
--- OUTPUT OMMITED ---
Deploy a syslog server running in the same OpenShift cluster that the collector pods
NOTE: The syslog server will write the logs to the file /var/log/messages inside the syslog pod. This path is ephemeral storage, this means, storage from the node, then, it could be considered:
- to create a PVC and mount it in the rsyslog pod for not getting full the node filesystem
- or in the rsyslog server pod, implement rotation of the
/var/log/messagesfile
1. Create additional configuration for rsyslog
$ cat > rsyslog-add.conf << EOF
\$ModLoad imtcp
\$ModLoad imudp
\$InputTCPServerRun 6514
\$UDPServerRun 6514
# Increase the amount of open files rsyslog is allowed, which includes open tcp sockets
# This is important if there are many clients.
# http://www.rsyslog.com/doc/rsconf1_maxopenfiles.html
\$MaxOpenFiles 2048
*.* /var/log/messages
EOF
2. Create all the OpenShift objects
$ oc new-project rsyslog-pj
$ oc new-app --name rsyslog-server --docker-image=registry.access.redhat.com/rhel7/rsyslog
$ oc create sa rsyslog-server-sa
$ oc adm policy add-scc-to-user anyuid -z rsyslog-server-sa
$ oc set serviceaccount deployment/rsyslog-server rsyslog-server-sa
$ oc create cm rsyslog-cm --from-file=rsyslog-add.conf=rsyslog-add.conf
$ oc set volume deployment/rsyslog-server --add -m /etc/rsyslog.d/ --configmap-name=rsyslog-cm
$ oc expose deployment rsyslog-server --port=6514
The multi log forwarder configuration
Grant permissions to read the logs
NOTE: for all the details, read Log Forwarding.
Step 1. Create the serviceAccount to be used in the Log Forwarder:
$ oc create sa fluentd2vector -n openshift-logging
Starting in Red Hat Logging v5.8, three new clusterRoles exists, one for each type of log to be desired to collect:
collect-application-logs: allows to collect application logscollect-infrastructure-logs: allows to collect infrastructure logscollect-audit-logs: allows to collect audit logs
In this example, it will be created the clusterRoleBinding for giving permissions to collect the application and insfructure logs. In case that also collecting the audit logs, it will be needed to create the clusterRoleBinding to the clusterRole: collect-audit-logs
$ oc create clusterrolebinding collect-app-logs-fluentd2vector --clusterrole=collect-application-logs --serviceaccount openshift-logging:fluentd2vector
$ oc create clusterrolebinding collect-infra-logs-fluentd2vector --clusterrole=collect-infrastructure-logs --serviceaccount openshift-logging:fluentd2vector
Step 2. Verify that the clusterRolebinding created:
$ oc get clusterrolebinding collect-app-logs-fluentd2vector
NAME ROLE AGE
collect-app-logs ClusterRole/collect-application-logs 6m1s
$ oc get clusterrolebinding collect-infra-logs-fluentd2vector
NAME ROLE AGE
collect-infrastructure-logs ClusterRole/collect-infrastructure-logs 4m59s
Create the clusterLogging CR
Create a clusterLogging CR for setting the resources: limits and requests and in case that needed, the nodeSelector and/or taints. In this example, it will be called the ClusterLogging CR: migration.
NOTE: It's important setting the limits.{cpu,memory}. When starting o during the time of the migration, some cpu and/or memory alarms for the pods called migration-* could be triggered, and it could be desired to be This page is not included, but the link has been rewritten to point to the nearest parent document.silenced
$ cat << EOF |oc create -f -
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "migration"
namespace: "openshift-logging"
spec:
managementState: "Managed"
collection:
type: "vector"
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
EOF
Create the clusterLogForwarder CR
Step 1. Create a clusterLogForwarder CR for collecting the same type logs that used by fluentd. The clusterLogForwarder must have the same name that the clusterLogging CR. In this example is migration
Read section "Other configurations" for when audit logs or additional inputs are defined
$ cat << EOF |oc create -f -
apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
name: migration
namespace: openshift-logging
spec:
serviceAccountName: fluentd2vector
outputs:
- name: logs
type: syslog
url: tcp://rsyslog-server.rsyslog-pj.svc:6514
pipelines:
- name: all
inputRefs:
- application
- infrastructure
outputRefs:
- logs
EOF
Step 2. Verify new collectors are running
$ oc get pods -l app.kubernetes.io/instance=migration -n openshift-logging
NAME READY STATUS RESTARTS AGE
migration-54q7t 1/1 Running 0 6m33s
migration-7ph2x 1/1 Running 0 6m33s
migration-g95xx 1/1 Running 0 6m33s
migration-gvwvj 1/1 Running 0 6m33s
migration-lbhj4 1/1 Running 0 6m33s
migration-tp8zf 1/1 Running 0 6m33s
Verify that the logs are arriving to the syslog server
$ oc -n rsyslog-pj exec $(oc get pods -o name -n rsyslog-pj) -- tail -f /var/log/messages
Migration for log forwarding using vector collectors
1. Verify Vector is reading the current logs produced by applications.
Vector will need to read all the logs on the nodes starting from 0. It can take a while to be aligned Vector with the last logs produced. This can be verified checking the last log message in the rsyslog server:
$ oc -n rsyslog-pj exec $(oc get pods -o name -n rsyslog-pj) -- tail -1 /var/log/messages
2. Stop the Vector pods
$ oc -n openshift-logging patch clusterlogging/migration --type=merge -p '{"spec":{"collection":{"nodeSelector":{"collectors": "stop"}}}}'
3. Verify the last log entry in the syslog server
$ oc -n rsyslog-pj exec $(oc get pods -o name -n rsyslog-pj) -- tail -1 /var/log/messages
4. Verify in the Log Store that the last log entry in the syslog is also in the Log Store
If the Log Store is Elasticsearch, check going to Kibana, if the Log Store is Loki, check going to the "OpenShift Console > Observe > Logs".
5. Stop the Fluentd pods
Once it's confirmed that the last log entry read by Vector was also read by Fluentd and in the Log Store, stop the Fluentd pods:
$ oc -n openshift-logging patch clusterlogging/instance --type=merge -p '{"spec":{"collection":{"nodeSelector":{"collectors": "stop"}}}}'
6. Copy the vector checkpoints to the clusterLogging instance path and verify them:
$ for node in $(oc get nodes -o name); do echo "### $node ###"; oc debug $node -- chroot /host /bin/bash -c "cp -Ra /var/lib/vector/openshift-logging/migration/* /var/lib/vector/" ; done
$ for node in $(oc get nodes -o name); do echo "### $node ###"; oc debug $node -- chroot /host /bin/bash -c "ls -ld /var/lib/vector/*"; done
7. Change collector type from fluentd to vector
Change from Fluentd to Vector the colletor type in the clusterlogging instance:
$ oc -n openshift-logging patch clusterlogging/instance --type=merge -p '{"spec":{"collection":{"type":"vector"}}}'
8. Remove the nodeSelector in the clusterlogging/instance for starting the vector pods:
$ oc -n openshift-logging patch clusterlogging/instance --type='json' -p='[{"op": "remove", "path": '/spec/collection/nodeSelector'}]'
9. Verify the collector pods are running and using Vector as collector type:
$ pod=$(oc -n openshift-logging get pods -l component=collector -o jsonpath='{.items[0].metadata.name}')
$ oc -n openshift-logging get pod $pod -o yaml |grep image:
image: registry.redhat.io/openshift-logging/vector-rhel9@sha256:b6691ffe1e58a570cbe6f5e12dec55305880935e14e84de1a361931b09b36047
image: registry.redhat.io/openshift-logging/vector-rhel9@sha256:b6691ffe1e58a570cbe6f5e12dec55305880935e14e84de1a361931b09b36047
Additional configurations
This is quick example for when the clusterLogForwarder CR instance exists and it's collecting:
- audit logs
- an input collecting only logs from the namespace
app-dev
Original clusterLogForwarder CR instance example:
$ oc get clusterlogforwarder instance -o yaml -n openshift-logging
[...]
spec:
inputs:
- application:
namespaces:
- app-dev
name: app
outputs:
- http:
method: POST
name: httpreceiver
type: http
url: http://httpreceiver.svc.cluster.local:9880/kubernetes
pipelines:
- inputRefs:
- application
- infrastructure
- audit
name: all-logs
outputRefs:
- default
- inputRefs:
- audit
name: external-app
outputRefs:
- httpreceiver
--- OUTPUT OMITTED ---
Step 1. Grant permissions to read the audit logs
$ oc create clusterrolebinding collect-audit-logs-fluentd2vector --clusterrole=collect-audit-logs --serviceaccount openshift-logging:fluentd2vector
Step 2. Create the clusterLogForwarder CR migration
The inputs and inputRefs from the clusterLogForwarder CR instance must exist and use them to log forward to the syslog server:
$ cat << EOF |oc create -f -
apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
name: migration
namespace: openshift-logging
spec:
serviceAccountName: fluentd2vector
inputs:
- application:
namespaces:
- app-dev
name: app
outputs:
- name: logs
type: syslog
url: tcp://rsyslog-server.rsyslog-pj.svc:6514
pipelines:
- name: all
inputRefs:
- application
- infrastructure
- audit
- app
outputRefs:
- logs
EOF
The rest of the steps in this guide are the same.
BONUS: delete resources
Delete the resources created for the migration not longer needed:
$ oc -n openshift-logging delete clusterLogFowarder migration
$ oc -n openshift-logging delete clusterLogging migration
$ oc delete clusterrolebinding collect-app-logs-fluentd2vector
$ oc delete clusterrolebinding collect-infra-logs-fluentd2vector
$ oc delete clusterrolebinding collect-audit-logs-fluentd2vector
$ oc -n openshift-logging delete serviceaccount fluentd2vector
$ oc delete project fluentd2vector
$ oc delete project rsyslog-pj