How to migrate Vector checkpoints in RHOCP 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Red Hat OpenShift Logging (RHOL)
    • 5
    • 6
  • Vector

Issue

  • After upgrading Logging from v5.8 to Logging v6 duplicated logs exists in the log storage
  • After upgrading Logging from v5.9 to Logging v6 duplicated logs are in the log storage
  • After upgrading Logging, it's observed 429 Too Many Requests errors and the storage size used in the Log Storage is highly increased
  • It's observed a big peak of cpu and memory after upgrading Logging

Resolution

Scenario 1. When migrating from Fluentd to Vector

Review the Red Hat Knowledge Article "Migrating the log collector from Fluentd to Vector reducing the number of duplicated logs in RHOCP 4".

Scenario 2. When migrating using Vector from RHOL v5.8 to v6

Follow the steps in the Red Hat Knowledge Article "How to transition the collectors and the default log store from Red Hat OpenShift Logging 5 to 6" and in the "Step 5: Delete the ClusterLogging instance and deploy the ClusterLogForwarder observability Custom Resource for Move the Vector checkpoints for the clusterLogging CR instance", execute:

  1. Download the file migrate_checkpoints_v58tov6_0.txt
  2. Open the file downloaded and adjust the variables ns and cr . By default, it considers that the namespace is openshift-logging (ns variable) and the clusterLogForwarder Custom Resource (CR) is collector (cr variable)
  3. Rename to migrate_checkpoints_v58tov6_0.sh
$ mv migrate_checkpoints_v58tov6_0.txt migrate_checkpoints_v58tov6_0.sh
  1. Give execution permissions:
$ chmod 755 migrate_checkpoints_v58tov6_0.sh
  1. Execute it:
$ ./migrate_checkpoints_v58tov6_0.sh
  1. The script above doesn't copy checkpoints for input_infrastructure_container directory. Use the following commands to copy from input_application_container directory to input_infrastructure_container directory:
$ ns="openshift-logging"
$ cr="collector"
$ for node in $(oc get nodes -o name); do echo "### $node ###"; oc debug $node -- chroot /host /bin/bash -c "cp -Ra /var/lib/vector/$ns/$cr/input_application_container/ /var/lib/vector/$ns/$cr/input_infrastructure_container/"; done
  1. (Option): The script automatically migrates checkpoints for the default application, infrastructure, and audit inputs. If the ClusterLogForwarder defines custom inputs (e.g., filtering specific namespaces into a named input), the script will not automatically populate the correct checkpoint directories for them. Users with custom inputs must manually populate the checkpoint directories for them before continuing with the next steps. The following are example commands to copy from default input_application_container directory to new input_<custom_input_name>_container directory:
$ ns="openshift-logging"
$ cr="collector"
$ for node in $(oc get nodes -o name); do echo "### $node ###"; oc debug $node -- chroot /host /bin/bash -c "cp -Ra /var/lib/vector/$ns/$cr/input_application_container/ /var/lib/vector/$ns/$cr/input_<custom_input_name>_container/"; done

Finished the script execution, continue with the next steps from the Red Hat Knowledge Article "How to transition the collectors and the default log store from Red Hat OpenShift Logging 5 to 6" to finish the migration.

Scenario 3. When migrating using Vector from RHOL v5.9 to v6

Follow the steps in the Red Hat Knowledge Article "How to transition the collectors and the default log store from Red Hat OpenShift Logging 5 to 6" and in the "Step 5: Delete the ClusterLogging instance and deploy the ClusterLogForwarder observability Custom Resource for Move the Vector checkpoints for the clusterLogging CR instance", execute:

$ ns="openshift-logging"
$ cr="collector"
$ for node in $(oc get nodes -o name); do oc debug $node -- chroot /host /bin/bash -c "mkdir -p /var/lib/vector/$ns/$cr" ; done
$ for node in $(oc get nodes -o name); do oc debug $node -- chroot /host /bin/bash -c "chmod -R 755 /var/lib/vector/$ns" ; done
$ for node in $(oc get nodes -o name); do echo "### $node ###"; oc debug $node -- chroot /host /bin/bash -c "cp -Ra /var/lib/vector/input* /var/lib/vector/$ns/$cr/"; done

Finished the script execution, continue with the next steps from the Red Hat Knowledge Article "How to transition the collectors and the default log store from Red Hat OpenShift Logging 5 to 6" to finish the migration.

Root Cause

Scenario 1. When migrating from Fluentd to Vector

The position files from Fluentd where it's referenced the log files opened and last position read inside the files have not a format that understable by Vector

Scenario 2. When migrating using Vector from RHOL v5.8 to v6

The Vector checkpoints path changed from the path /var/lib/vector/raw_* to /var/lib/vector/$ns/$cr/input_*.

Scenario 3. When migrating using Vector from RHOL v5.9 to v6

The Vector checkpoints path changed from the path /var/lib/vector/input to /var/lib/vector/$ns/$cr/input_*.

Note: ns=<namespace> and cr=<clusterLogForwarder Custom Resource>.

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.