Migrating OpenShift Logging Operator log store from Elasticsearch to Loki in Red Hat OpenShift Container Platform 4

Updated

The following document describes how to migrate the OpenShift Logging storage service from Elasticsearch to LokiStack. The guide includes only steps on how to switch forwarding logs from Elasticsearch to LokiStack! It does not include any steps for migrating data between these two. It aims to ensure both Log Storage stacks run in parallel until the informed user can confidently shutdown Elasticsearch.

In summary, after applying the following steps:

  • The old logs will still be served by Elasticsearch and visible only through Kibana.
  • The new logs will be served by LokiStack and visible through the OpenShift Console logs pages (e.g. Admin->Observe->Logs)

Assumptions

  1. Red Hat OpenShift Logging Operator is already installed and upgraded to the latest available version for your current OpenShift cluster version that still supports the use of Elasticsearch for log storage. The absolute last version is in the OpenShift Logging 5.8.z stream
  2. The OpenShift Elasticsearch Operator is installed and in use by the current ClusterLogging instance

Loki Prerequisites

Storage

Loki uses different types of storage for its long-term and temporary or short-term storage needs.

Long-term storage

Long-term storage requires Loki has access to a supported object store. This includes:

  • AWS S3
  • Google Cloud Storage
  • Azure
  • Swift
  • S3 Compatible (as Minio)
  • OpenShift Data Foundation

The choice and preparation of an appropriate object storage must be made before attempting to install Loki. As part of the Loki installation process, secrets containing the credentials for accessing the object store should be created before creating the LokiStack instance, along with other resources such as ObjectBucketClaims and ConfigMaps depending on the object store provider chosen.

Short-term storage

Short-term storage uses standard PersistentVolumeClaims. It is configured by specifying an appropriate StorageClass to use. Block storage is preferred for performance reasons. A StorageClass that provides object storage cannot be used.

CPU and Memory Requirements

See the "Loki sizing" table in the LokiStack deployment sizing section of the official Red Hat OpenShift Logging documentation.

NOTE: By default, Loki will deploy its workloads on worker nodes unless appropriate nodeSelectors and tolerations are specified in the LokiStack instance when it is created. If your cluster has dedicated infrastructure nodes, see xxxx

Current Stack

Note: if Fluentd is the collector type, consider reading the Red Hat Knowledge Base "Migrating the log collector from Fluentd to Vector reducing the number of logs duplicated in RHOCP 4".

Assuming the current stack looks like the below that represents a fully managed OpenShift Logging stack with Log Store: Elastisearch, and Kibana including collection, forwarding, storage, and visualization.

Disclaimer: the stack might vary regarding resources/nodes/tolerations/selectors/collector type/backend storage used

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    elasticsearch:
      nodeCount: 3
      storage:
        storageClassName: gp2
        size: 80Gi
      resources:
        requests:
          memory: 16Gi
        limits:
          memory: 16Gi
      redundancyPolicy: "SingleRedundancy"
    retentionPolicy:
      application:
        maxAge: 24h
      audit:
        maxAge: 24h
      infra:
        maxAge: 24h
  visualization:
    type: "kibana"
    kibana:
      replicas: 1
  collection:
  [...]

If using ClusterLogForwarder to forward audit logs

In the case of using the Forwarding audit logs to the log store guide to forward audit logs to the default log store, there is no need to do change anything on the ClusterLogForwarder resource. The collector pods will be configured to continue sending audit logs to forward new audit logs to LokiStack, too.

Installing and configuring Loki

Step 1: Install the Loki Operator

The Loki Operator can be installed using one of two different methods:

Step 2: Create the Object Store secret

Create the secret containing the details for accessing your chosen object store provider by following the appropriate section under the Loki object storage in the official Red Hat OpenShift Logging documentation

Step 3: Create the LokiStack instance

Define the following variables based on your environment, requirements and cofigurations in the previous steps:

VariableDescriptionExamples
LOKI_SIZESee the "Loki sizing" table1x.extra-small, 1x.small, 1x.medium
LOKI_LONGTERM_STORAGE_TYPESee the "Secret type quick reference" tables3, azure, gcs, swift
LOKI_LONGTERM_STORAGE_SECRETFrom the previous "Create the Object Store secret" steplogging-loki-aws, logging-loki-azure, logging-loki-odf, etc
LOKI_SHORTTERM_STORAGECLASSStorageClass for a block or file storage provisioner in the clusterthin, gp2, gp3, managed-premium, etc

Substitute the variables into the basic LokiStack YAML template below:

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  size: ${LOKI_SIZE}
  storage:
    schemas:
    - version: v13
      effectiveDate: "2022-06-01"
    secret:
      name: ${LOKI_LONGTERM_STORAGE_SECRET}
      type: ${LOKI_LONGTERM_STORAGE_TYPE}
  storageClassName: ${LOKI_SHORTTERM_STORAGECLASS}
  tenants:
    mode: openshift-logging

Advanced LokiStack configurations

Scheduling Loki components on Infrastructure nodes

To schedule all Loki workloads on Infrastructure nodes, the appropriate nodeSelector and tolerations must be applied. For standard "infra" nodes that have a taint node-role.kubernetes.io/infra, the following should be appended to the end of the above basic LokiStack YAML. Note that that the template section should start indented to the same level as tenants in the above YAML:

    template:
      compactor:
        nodeSelector:
          node-role.kubernetes.io/infra: ""
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
        - effect: NoExecute
          key: node-role.kubernetes.io/infra
          operator: Exists
      distributor:
        nodeSelector:
          node-role.kubernetes.io/infra: ""
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
        - effect: NoExecute
          key: node-role.kubernetes.io/infra
          operator: Exists
      gateway:
        nodeSelector:
          node-role.kubernetes.io/infra: ""
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
        - effect: NoExecute
          key: node-role.kubernetes.io/infra
          operator: Exists
      indexGateway:
        nodeSelector:
          node-role.kubernetes.io/infra: ""
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
        - effect: NoExecute
          key: node-role.kubernetes.io/infra
          operator: Exists
      ingester:
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
        - effect: NoExecute
          key: node-role.kubernetes.io/infra
          operator: Exists
      querier:
        nodeSelector:
          node-role.kubernetes.io/infra: ""
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
        - effect: NoExecute
          key: node-role.kubernetes.io/infra
          operator: Exists
      queryFrontend:
        nodeSelector:
          node-role.kubernetes.io/infra: ""
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
        - effect: NoExecute
          key: node-role.kubernetes.io/infra
          operator: Exists
      ruler:
        nodeSelector:
          node-role.kubernetes.io/infra: ""
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
        - effect: NoExecute
          key: node-role.kubernetes.io/infra
          operator: Exists

See the Red Hat OpenShift Logging documentation section on Loki pod placement for more details

Disconnect Elasticsearch and Kibana CRs from ClusterLogging

To ensure Elasticsearch and Kibana continue to run on the Cluster while we switch ClusterLogging from them to LokiStack/OpenShift Console, we need to disconnect the custom resources from being owned by ClusterLogging.

Step 1: Temporarily set ClusterLogging to State Unmanaged

$ oc -n openshift-logging patch clusterlogging/instance -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge

Step 2: Remove ClusterLogging OwnerReferences from Elasticsearch resource

The following command ensures that the ClusterLogging does not own the Elasticsearch resource anymore. This means updates on the ClusterLogging resource's logStore field will not be applied to the Elasticsearch resource anymore.

$ oc -n openshift-logging patch elasticsearch/elasticsearch -p '{"metadata":{"ownerReferences": []}}' --type=merge

Step 3: Remove ClusterLogging OwnerReferences from Kibana resource

The following command ensures that the ClusterLogging does not own the Kibana resource anymore. This means updates on the ClusterLogging resource's visualization field will not be applied to the Kibana resource anymore.

$ oc -n openshift-logging patch kibana/kibana -p '{"metadata":{"ownerReferences": []}}' --type=merge

Step 4: Backup Elasticsearch and Kibana resources

To ensure that no accidental deletes destroy the previous storage/visualization components namely Elasticsearch and Kibana, the following steps describe how to backup the resources (Require the small utility Content from github.com is not included.yq):

Elasticsearch:

$ oc -n openshift-logging get elasticsearch elasticsearch -o yaml \
  | yq -r 'del(.status,.metadata | .resourceVersion,.uid,.generation,.creationTimestamp,.selfLink)'  > /tmp/cr-elasticsearch.yaml
    

Kibana:

$ oc -n openshift-logging get kibana kibana -o yaml \
  | yq -r 'del(.status,.metadata | .resourceVersion,.uid,.generation,.creationTimestamp,.selfLink)' > /tmp/cr-kibana.yaml

Switch ClusterLogging to LokiStack

Step 1: Switch log storage to LokiStack

The manifest will apply several changes to the ClusterLogging resource:

  1. It will re-instantiate the management state to Managed again.

  2. It will switch the logStore spec from elasticsearch to lokistack. In turn, this will restart the collector pods to start forwarding logs to lokistack from now on.

  3. It will remove the visualization spec. In turn, the cluster-logging-operator will install the logging-view-plugin that enables observing lokistack logs in the OpenShift Console.

  4. Replace the current spec.collection section with the available in the running cluster

    $ cat << EOF |oc replace -f -
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
      namespace: "openshift-logging"
    spec:
      managementState: "Managed"
      logStore:
        type: "lokistack"
        lokistack:
          name: logging-loki
      collection:   <------------------ replace with the current collection configuration
      [...]
      visualization: #Keep this section as long as you need to keep Kibana.
        kibana:
          replicas: 1
        type: kibana
    EOF
    

Step 2: Re-instantiate Kibana resource

Because we remove in the previous step the visualization field entirely in favor of the operator to install the OpenShift Console integration the same operator will remove the Kibana resource, too. This is unfortunately a non-critical issue as long as we have a backup of the Kibana resource. The reason is that the operator removes the Kibana resource named kibana from openshift-logging automatically without checking any owner references. This used to be correct as long as Kibana was the only supported visualization component on OpenShift Logging.

$ oc -n openshift-logging apply -f /tmp/cr-kibana.yaml

Step 3: Enable the console view plugin

In case that the console view plugin is not enabled, then, it should be to view the logs integrated from the RHOCP Console -> Observe -> Logs.

$ oc patch consoles.operator.openshift.io cluster  --type=merge --patch '{ "spec": { "plugins": ["logging-view-plugin"] } }'

Delete the Elasticsearch stack

When the retention period for the log stored in the Elasticsearch logstore is expired and no more logs are visible in the Kibana instance is it possible to remove the old stack to release resources.

Step 1: Delete Elasticsearch and Kibana resources

$ oc -n openshift-logging delete kibana/kibana elasticsearch/elasticsearch

Step 2: Delete the PVCs used by the Elasticsearch instances

$ oc delete -n openshift-logging pvc -l logging-cluster=elasticsearch
Category
Components
Article Type