Configuring logging
Configuring log forwarding and LokiStack
Abstract
Chapter 1. Configuring log forwarding
The ClusterLogForwarder (CLF) allows users to configure forwarding of logs to various destinations. It provides a flexible way to select log messages from different sources, send them through a pipeline that can transform or filter them, and forward them to one or more outputs.
Key Functions of the ClusterLogForwarder
- Selects log messages using inputs
- Forwards logs to external destinations using outputs
- Filters, transforms, and drops log messages using filters
- Defines log forwarding pipelines connecting inputs, filters and outputs
1.1. Setting up log collection
This release of Cluster Logging requires administrators to explicitly grant log collection permissions to the service account associated with ClusterLogForwarder. This was not required in previous releases for the legacy logging scenario, which consisted of a ClusterLogging resource and, optionally, a ClusterLogForwarder.logging.openshift.io resource.
The Red Hat OpenShift Logging Operator provides collect-audit-logs, collect-application-logs, and collect-infrastructure-logs cluster roles, which enable the collector to collect audit logs, application logs, and infrastructure logs respectively.
Set up log collection by binding the required cluster roles to your service account.
1.1.1. Legacy service accounts
To use the existing legacy service account logcollector, create the following ClusterRoleBinding:
$ oc adm policy add-cluster-role-to-user collect-application-logs system:serviceaccount:openshift-logging:logcollector
$ oc adm policy add-cluster-role-to-user collect-infrastructure-logs system:serviceaccount:openshift-logging:logcollector
Additionally, create the following ClusterRoleBinding if collecting audit logs:
$ oc adm policy add-cluster-role-to-user collect-audit-logs system:serviceaccount:openshift-logging:logcollector
1.1.2. Creating service accounts
Prerequisites
-
The Red Hat OpenShift Logging Operator is installed in the
openshift-loggingnamespace. - You have administrator permissions.
Procedure
- Create a service account for the collector. If you want to write logs to a storage system that requires a token for authentication, you must include a token in the service account.
Bind the appropriate cluster roles to the service account:
Example binding command
$ oc adm policy add-cluster-role-to-user <cluster_role_name> system:serviceaccount:<namespace_name>:<service_account_name>
1.1.2.1. Cluster role binding for your service account
The role_binding.yaml file binds the ClusterLogging Operator’s ClusterRole to a specific ServiceAccount, allowing it to manage Kubernetes resources cluster-wide.
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: manager-rolebinding roleRef: 1 apiGroup: rbac.authorization.k8s.io 2 kind: ClusterRole 3 name: cluster-logging-operator 4 subjects: 5 - kind: ServiceAccount 6 name: cluster-logging-operator 7 namespace: openshift-logging 8
- 1
- roleRef: References the ClusterRole to which the binding applies.
- 2
- apiGroup: Indicates the RBAC API group, specifying that the ClusterRole is part of Kubernetes' RBAC system.
- 3
- kind: Specifies that the referenced role is a ClusterRole, which applies cluster-wide.
- 4
- name: The name of the ClusterRole being bound to the ServiceAccount, here cluster-logging-operator.
- 5
- subjects: Defines the entities (users or service accounts) that are being granted the permissions from the ClusterRole.
- 6
- kind: Specifies that the subject is a ServiceAccount.
- 7
- Name: The name of the ServiceAccount being granted the permissions.
- 8
- namespace: Indicates the namespace where the ServiceAccount is located.
1.1.2.2. Writing application logs
The write-application-logs-clusterrole.yaml file defines a ClusterRole that grants permissions to write application logs to the Loki logging application.
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cluster-logging-write-application-logs rules: 1 - apiGroups: 2 - loki.grafana.com 3 resources: 4 - application 5 resourceNames: 6 - logs 7 verbs: 8 - create 9
- 1
- rules: Specifies the permissions granted by this ClusterRole.
- 2
- apiGroups: Refers to the API group loki.grafana.com, which relates to the Loki logging system.
- 3
- loki.grafana.com: The API group for managing Loki-related resources.
- 4
- resources: The resource type that the ClusterRole grants permission to interact with.
- 5
- application: Refers to the application resources within the Loki logging system.
- 6
- resourceNames: Specifies the names of resources that this role can manage.
- 7
- logs: Refers to the log resources that can be created.
- 8
- verbs: The actions allowed on the resources.
- 9
- create: Grants permission to create new logs in the Loki system.
1.1.2.3. Writing audit logs
The write-audit-logs-clusterrole.yaml file defines a ClusterRole that grants permissions to create audit logs in the Loki logging system.
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cluster-logging-write-audit-logs rules: 1 - apiGroups: 2 - loki.grafana.com 3 resources: 4 - audit 5 resourceNames: 6 - logs 7 verbs: 8 - create 9
- 1
- rules: Defines the permissions granted by this ClusterRole.
- 2
- apiGroups: Specifies the API group loki.grafana.com.
- 3
- loki.grafana.com: The API group responsible for Loki logging resources.
- 4
- resources: Refers to the resource type this role manages, in this case, audit.
- 5
- audit: Specifies that the role manages audit logs within Loki.
- 6
- resourceNames: Defines the specific resources that the role can access.
- 7
- logs: Refers to the logs that can be managed under this role.
- 8
- verbs: The actions allowed on the resources.
- 9
- create: Grants permission to create new audit logs.
1.1.2.4. Writing infrastructure logs
The write-infrastructure-logs-clusterrole.yaml file defines a ClusterRole that grants permission to create infrastructure logs in the Loki logging system.
Sample YAML
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cluster-logging-write-infrastructure-logs rules: 1 - apiGroups: 2 - loki.grafana.com 3 resources: 4 - infrastructure 5 resourceNames: 6 - logs 7 verbs: 8 - create 9
- 1
- rules: Specifies the permissions this ClusterRole grants.
- 2
- apiGroups: Specifies the API group for Loki-related resources.
- 3
- loki.grafana.com: The API group managing the Loki logging system.
- 4
- resources: Defines the resource type that this role can interact with.
- 5
- infrastructure: Refers to infrastructure-related resources that this role manages.
- 6
- resourceNames: Specifies the names of resources this role can manage.
- 7
- logs: Refers to the log resources related to infrastructure.
- 8
- verbs: The actions permitted by this role.
- 9
- create: Grants permission to create infrastructure logs in the Loki system.
1.1.2.5. ClusterLogForwarder editor role
The clusterlogforwarder-editor-role.yaml file defines a ClusterRole that allows users to manage ClusterLogForwarders in OpenShift.
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: clusterlogforwarder-editor-role rules: 1 - apiGroups: 2 - observability.openshift.io 3 resources: 4 - clusterlogforwarders 5 verbs: 6 - create 7 - delete 8 - get 9 - list 10 - patch 11 - update 12 - watch 13
- 1
- rules: Specifies the permissions this ClusterRole grants.
- 2
- apiGroups: Refers to the OpenShift-specific API group
- 3
- observability.openshift.io: The API group for managing observability resources, like logging.
- 4
- resources: Specifies the resources this role can manage.
- 5
- clusterlogforwarders: Refers to the log forwarding resources in OpenShift.
- 6
- verbs: Specifies the actions allowed on the ClusterLogForwarders.
- 7
- create: Grants permission to create new ClusterLogForwarders.
- 8
- delete: Grants permission to delete existing ClusterLogForwarders.
- 9
- get: Grants permission to retrieve information about specific ClusterLogForwarders.
- 10
- list: Allows listing all ClusterLogForwarders.
- 11
- patch: Grants permission to partially modify ClusterLogForwarders.
- 12
- update: Grants permission to update existing ClusterLogForwarders.
- 13
- watch: Grants permission to monitor changes to ClusterLogForwarders.
1.2. Modifying log level in collector
To modify the log level in the collector, you can set the observability.openshift.io/log-level annotation to trace, debug, info, warn, error, and off.
Example log level annotation
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: collector
annotations:
observability.openshift.io/log-level: debug
# ...1.3. Managing the Operator
The ClusterLogForwarder resource has a managementState field that controls whether the Operator actively manages its resources or leaves them unmanaged:
- Managed
- (default) The Operator will drive the logging resources to match the desired state in the CLF spec.
- Unmanaged
- The Operator will not take any action related to the logging components.
This allows administrators to temporarily pause log forwarding by setting managementState to Unmanaged.
1.4. Structure of the ClusterLogForwarder
The CLF has a spec section that contains the following key components:
- Inputs
-
Select log messages to be forwarded. Built-in input types
application,infrastructureandauditforward logs from different parts of the cluster. You can also define custom inputs. - Outputs
- Define destinations to forward logs to. Each output has a unique name and type-specific configuration.
- Pipelines
- Define the path logs take from inputs, through filters, to outputs. Pipelines have a unique name and consist of a list of input, output and filter names.
- Filters
- Transform or drop log messages in the pipeline. Users can define filters that match certain log fields and drop or modify the messages. Filters are applied in the order specified in the pipeline.
1.4.1. Inputs
Inputs are configured in an array under spec.inputs. There are three built-in input types:
- application
- Selects logs from all application containers, excluding those in infrastructure namespaces.
- infrastructure
Selects logs from nodes and from infrastructure components running in the following namespaces:
-
default -
kube -
openshift -
Containing the
kube-oropenshift-prefix
-
- audit
- Selects logs from the OpenShift API server audit logs, Kubernetes API server audit logs, ovn audit logs, and node audit logs from auditd.
Users can define custom inputs of type application that select logs from specific namespaces or using pod labels.
1.4.2. Outputs
Outputs are configured in an array under spec.outputs. Each output must have a unique name and a type. Supported types are:
- azureMonitor
- Forwards logs to Azure Monitor.
- cloudwatch
- Forwards logs to AWS CloudWatch.
- googleCloudLogging
- Forwards logs to Google Cloud Logging.
- http
- Forwards logs to a generic HTTP endpoint.
- kafka
- Forwards logs to a Kafka broker.
- loki
- Forwards logs to a Loki logging backend.
- lokistack
- Forwards logs to the logging supported combination of Loki and web proxy with OpenShift Container Platform authentication integration. LokiStack’s proxy uses OpenShift Container Platform authentication to enforce multi-tenancy.
- otlp
- Forwards logs using the OpenTelemetry Protocol.
- splunk
- Forwards logs to Splunk.
- syslog
- Forwards logs to an external syslog server.
Each output type has its own configuration fields.
1.4.3. Pipelines
Pipelines are configured in an array under spec.pipelines. Each pipeline must have a unique name and consists of:
- inputRefs
- Names of inputs whose logs should be forwarded to this pipeline.
- outputRefs
- Names of outputs to send logs to.
- filterRefs
- (optional) Names of filters to apply.
The order of filterRefs matters, as they are applied sequentially. Earlier filters can drop messages that will not be processed by later filters.
1.4.4. Filters
Filters are configured in an array under spec.filters. They can match incoming log messages based on the value of structured fields and modify or drop them.
1.5. About forwarding logs to third-party systems
To send logs to specific endpoints inside and outside your OpenShift Container Platform cluster, you specify a combination of outputs and pipelines in a ClusterLogForwarder custom resource (CR). You can also use inputs to forward the application logs associated with a specific project to an endpoint. Authentication is provided by a Kubernetes Secret object.
- pipeline
Defines simple routing from one log type to one or more outputs, or which logs you want to send. The log types are one of the following:
-
application. Container logs generated by user applications running in the cluster, except infrastructure container applications. -
infrastructure. Container logs from pods that run in theopenshift*,kube*, ordefaultprojects and journal logs sourced from node file system. -
audit. Audit logs generated by the node audit system,auditd, Kubernetes API server, OpenShift API server, and OVN network.
You can add labels to outbound log messages by using
key:valuepairs in the pipeline. For example, you might add a label to messages that are forwarded to other data centers or label the logs by type. Labels that are added to objects are also forwarded with the log message.-
- input
Forwards the application logs associated with a specific project to a pipeline.
In the pipeline, you define which log types to forward using an
inputRefparameter and where to forward the logs to using anoutputRefparameter.- Secret
-
A
key:value mapthat contains confidential data such as user credentials.
Note the following:
-
If you do not define a pipeline for a log type, the logs of the undefined types are dropped. For example, if you specify a pipeline for the
applicationandaudittypes, but do not specify a pipeline for theinfrastructuretype,infrastructurelogs are dropped. -
You can use multiple types of outputs in the
ClusterLogForwardercustom resource (CR) to send logs to servers that support different protocols.
The following example forwards the audit logs to a secure external Elasticsearch instance.
Sample log forwarding outputs and pipelines
kind: ClusterLogForwarder
apiVersion: observability.openshift.io/v1
metadata:
name: instance
namespace: openshift-logging
spec:
serviceAccount:
name: logging-admin
outputs:
- name: external-es
type: elasticsearch
elasticsearch:
url: 'https://example-elasticsearch-secure.com:9200'
version: 8 1
index: '{.log_type||"undefined"}' 2
authentication:
username:
key: username
secretName: es-secret 3
password:
key: password
secretName: es-secret 4
tls:
ca: 5
key: ca-bundle.crt
secretName: es-secret
certificate:
key: tls.crt
secretName: es-secret
key:
key: tls.key
secretName: es-secret
pipelines:
- name: my-logs
inputRefs:
- application
- infrastructure
outputRefs:
- external-es- 1
- Forwarding to an external Elasticsearch of version 8.x or greater requires the
versionfield to be specified. - 2
indexis set to read the field value.log_typeand falls back to "unknown" if not found.- 3 4
- Use username and password to authenticate to the server
- 5
- Enable Mutual Transport Layer Security (mTLS) between collector and elasticsearch. The spec identifies the keys and secret to the respective certificates that they represent.
Supported Authorization Keys
Common key types are provided here. Some output types support additional specialized keys, documented with the output-specific configuration field. All secret keys are optional. Enable the security features you want by setting the relevant keys. You are responsible for creating and maintaining any additional configurations that external destinations might require, such as keys and secrets, service accounts, port openings, or global proxy configuration. Open Shift Logging will not attempt to verify a mismatch between authorization combinations.
- Transport Layer Security (TLS)
Using a TLS URL (
http://...orssl://...) without a secret enables basic TLS server-side authentication. Additional TLS features are enabled by including a secret and setting the following optional fields:-
passphrase: (string) Passphrase to decode an encoded TLS private key. Requirestls.key. -
ca-bundle.crt: (string) File name of a customer CA for server authentication.
-
- Username and Password
-
username: (string) Authentication user name. Requirespassword. -
password: (string) Authentication password. Requiresusername.
-
- Simple Authentication Security Layer (SASL)
-
sasl.enable(boolean) Explicitly enable or disable SASL. If missing, SASL is automatically enabled when any of the othersasl.keys are set. -
sasl.mechanisms: (array) List of allowed SASL mechanism names. If missing or empty, the system defaults are used. -
sasl.allow-insecure: (boolean) Allow mechanisms that send clear-text passwords. Defaults to false.
-
1.5.1. Creating a Secret
You can create a secret in the directory that contains your certificate and key files by using the following command:
$ oc create secret generic -n <namespace> <secret_name> \ --from-file=ca-bundle.crt=<your_bundle_file> \ --from-literal=username=<your_username> \ --from-literal=password=<your_password>
Generic or opaque secrets are recommended for best results.
1.6. Creating a log forwarder
To create a log forwarder, create a ClusterLogForwarder custom resource (CR). This CR defines the service account, permissible input log types, pipelines, outputs, and any optional filters.
You need administrator permissions for the namespace where you create the ClusterLogForwarder CR.
ClusterLogForwarder CR example
apiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <log_forwarder_name> namespace: <log_forwarder_namespace> spec: outputs: 1 - name: <output_name> type: <output_type> inputs: 2 - name: <input_name> type: <input_type> filters: 3 - name: <filter_name> type: <filter_type> pipelines: - inputRefs: - <input_name> 4 - outputRefs: - <output_name> 5 - filterRefs: - <filter_name> 6 serviceAccount: name: <service_account_name> 7 # ...
- 1
- The type of output that you want to forward logs to. The value of this field can be
azureMonitor,cloudwatch,elasticsearch,googleCloudLogging,http,kafka,loki,lokistack,otlp,splunk, orsyslog. - 2
- A list of inputs. The names
application,audit, andinfrastructureare reserved for the default inputs. - 3
- A list of filters to apply to records going through this pipeline. Each filter is applied in the order defined here. If a filter drops a records, subsequent filters are not applied.
- 4
- This value should be the same as the input name. You can also use the default input names
application,infrastructure, andaudit. - 5
- This value should be the same as the output name.
- 6
- This value should be the same as the filter name.
- 7
- The name of your service account.
1.7. Tuning log payloads and delivery
The tuning spec in the ClusterLogForwarder custom resource (CR) provides a means of configuring your deployment to prioritize either throughput or durability of logs.
For example, if you need to reduce the possibility of log loss when the collector restarts, or you require collected log messages to survive a collector restart to support regulatory mandates, you can tune your deployment to prioritize log durability. If you use outputs that have hard limitations on the size of batches they can receive, you may want to tune your deployment to prioritize log throughput.
To use this feature, your logging deployment must be configured to use the Vector collector. The tuning spec in the ClusterLogForwarder CR is not supported when using the Fluentd collector.
The following example shows the ClusterLogForwarder CR options that you can modify to tune log forwarder outputs:
Example ClusterLogForwarder CR tuning options
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
# ...
spec:
# ...
outputs:
- name: default-lokistack
type: lokiStack
lokiStack:
tuning:
deliveryMode: AtLeastOnce 1
compression: none 2
maxWrite: <integer> 3
minRetryDuration: 1s 4
maxRetryDuration: 1s 5
# ...- 1
- Specify the delivery mode for log forwarding.
-
AtLeastOncedelivery means that if the log forwarder crashes or is restarted, any logs that were read before the crash but not sent to their destination are re-sent. It is possible that some logs are duplicated after a crash. -
AtMostOncedelivery means that the log forwarder makes no effort to recover logs lost during a crash. This mode gives better throughput, but may result in greater log loss.
-
- 2
- Specifying a
compressionconfiguration causes data to be compressed before it is sent over the network. Note that not all output types support compression, and if the specified compression type is not supported by the output, this results in an error. For more information, see "Supported compression types for tuning outputs". - 3
- Specifies a limit for the maximum payload of a single send operation to the output.
- 4
- Specifies a minimum duration to wait between attempts before retrying delivery after a failure. This value is a string, and can be specified as milliseconds (
ms), seconds (s), or minutes (m). - 5
- Specifies a maximum duration to wait between attempts before retrying delivery after a failure. This value is a string, and can be specified as milliseconds (
ms), seconds (s), or minutes (m).
Table 1.1. Supported compression types for tuning outputs
| Compression algorithm | Splunk | Amazon Cloudwatch | Elasticsearch 8 | LokiStack | Apache Kafka | HTTP | Syslog | Google Cloud | Microsoft Azure Monitoring |
|---|---|---|---|---|---|---|---|---|---|
|
| X | X | X | X | X | ||||
|
| X | X | X | X | |||||
|
| X | X | X | ||||||
|
| X | X | X | ||||||
|
| X |
1.7.1. Enabling multi-line exception detection
Enables multi-line error detection of container logs.
Enabling this feature could have performance implications and may require additional computing resources or alternate logging solutions.
Log parsers often incorrectly identify separate lines of the same exception as separate exceptions. This leads to extra log entries and an incomplete or inaccurate view of the traced information.
Example java exception
java.lang.NullPointerException: Cannot invoke "String.toString()" because "<param1>" is null
at testjava.Main.handle(Main.java:47)
at testjava.Main.printMe(Main.java:19)
at testjava.Main.main(Main.java:10)-
To enable logging to detect multi-line exceptions and reassemble them into a single log entry, ensure that the
ClusterLogForwarderCustom Resource (CR) contains adetectMultilineErrorsfield under the.spec.filters.
Example ClusterLogForwarder CR
apiVersion: "observability.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
name: <log_forwarder_name>
namespace: <log_forwarder_namespace>
spec:
serviceAccount:
name: <service_account_name>
filters:
- name: <name>
type: detectMultilineException
pipelines:
- inputRefs:
- <input-name>
name: <pipeline-name>
filterRefs:
- <filter-name>
outputRefs:
- <output-name>1.7.1.1. Details
When log messages appear as a consecutive sequence forming an exception stack trace, they are combined into a single, unified log record. The first log message’s content is replaced with the concatenated content of all the message fields in the sequence.
The collector supports the following languages:
- Java
- JS
- Ruby
- Python
- Golang
- PHP
- Dart
1.8. Forwarding logs to Google Cloud Platform (GCP)
You can forward logs to Content from cloud.google.com is not included.Google Cloud Logging.
Forwarding logs to GCP is not supported on Red Hat OpenShift on AWS.
Prerequisites
- Red Hat OpenShift Logging Operator has been installed.
Procedure
Create a secret using your Content from cloud.google.com is not included.Google service account key.
$ oc -n openshift-logging create secret generic gcp-secret --from-file google-application-credentials.json=<your_service_account_key_file.json>
Create a
ClusterLogForwarderCustom Resource YAML using the template below:apiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <log_forwarder_name> namespace: openshift-logging spec: serviceAccount: name: <service_account_name> 1 outputs: - name: gcp-1 type: googleCloudLogging googleCloudLogging: authentication: credentials: secretName: gcp-secret key: google-application-credentials.json id: type : project value: openshift-gce-devel 2 logId : app-gcp 3 pipelines: - name: test-app inputRefs: 4 - application outputRefs: - gcp-1
- 1
- Specify the name of your service account.
- 2
- Set a
project,folder,organization, orbillingAccountfield and its corresponding value, depending on where you want to store your logs in the GCP resource hierarchy. - 3
- Set the value to add to the
logNamefield of the log entry. The value can be a combination of static and dynamic values consisting of field paths followed by||, followed by another field path or a static value. A dynamic value must be encased in single curly brackets{}and must end with a static fallback value separated with||. Static values can only contain alphanumeric characters along with dashes, underscores, dots and forward slashes. - 4
- Specify the names of inputs defined in the
input.namefield for this pipeline. You can also use the built-in valuesapplication,infrastructure,audit.
1.9. Forwarding logs to Splunk
You can forward logs to the Splunk HTTP Event Collector (HEC).
Prerequisites
- Red Hat OpenShift Logging Operator has been installed
- You have obtained a Base64 encoded Splunk HEC token.
Procedure
Create a secret using your Base64 encoded Splunk HEC token.
$ oc -n openshift-logging create secret generic vector-splunk-secret --from-literal hecToken=<HEC_Token>
Create or edit the
ClusterLogForwarderCustom Resource (CR) using the template below:apiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <log_forwarder_name> namespace: openshift-logging spec: serviceAccount: name: <service_account_name> 1 outputs: - name: splunk-receiver 2 type: splunk 3 splunk: url: '<http://your.splunk.hec.url:8088>' 4 authentication: token: secretName: splunk-secret key: hecToken 5 index: '{.log_type||"undefined"}' 6 source: '{.log_source||"undefined"}' 7 indexedFields: ['.log_type', '.log_source'] 8 payloadKey: '.kubernetes' 9 tuning: compression: gzip 10 pipelines: - name: my-logs inputRefs: 11 - application - infrastructure outputRefs: - splunk-receiver 12- 1
- The name of your service account.
- 2
- Specify a name for the output.
- 3
- Specify the output type as
splunk. - 5
- Specify the name of the secret that contains your HEC token.
- 4
- Specify the URL, including port, of your Splunk HEC.
- 6
- Specify the name of the index to send events to. If you do not specify an index, the default index of the splunk server configuration is used. This is an optional field.
- 7
- Specify the source of events to be sent to this sink. You can configure dynamic per-event values. This field is optional.
- 8
- Specify the fields to be added to the Splunk index. This field is optional.
- 9
- Specify the record field to be used as the payload. This field is optional.
- 10
- Specify the compression configuration, which can be either
gzipornone. The default value isnone. This field is optional. - 11
- Specify the input names.
- 12
- Specify the name of the output to use when forwarding logs with this pipeline.
1.10. Forwarding logs over HTTP
To enable forwarding logs over HTTP, specify http as the output type in the ClusterLogForwarder custom resource (CR).
Procedure
Create or edit the
ClusterLogForwarderCR using the template below:Example ClusterLogForwarder CR
apiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <log_forwarder_name> namespace: <log_forwarder_namespace> spec: managementState: Managed outputs: - name: <output_name> type: http http: headers: 1 h1: v1 h2: v2 authentication: username: key: username secretName: <http_auth_secret> password: key: password secretName: <http_auth_secret> timeout: 300 proxyURL: <proxy_url> 2 url: <url> 3 tls: insecureSkipVerify: 4 ca: key: <ca_certificate> secretName: <secret_name> 5 pipelines: - inputRefs: - application name: pipe1 outputRefs: - <output_name> 6 serviceAccount: name: <service_account_name> 7- 1
- Additional headers to send with the log record.
- 2
- Optional: URL of the HTTP/HTTPS proxy that should be used to forward logs over http or https from this output. This setting overrides any default proxy settings for the cluster or the node.
- 3
- Destination address for logs.
- 4
- Values are either
trueorfalse. - 5
- Secret name for destination credentials.
- 6
- This value should be the same as the output name.
- 7
- The name of your service account.
1.11. Forwarding logs to an external Loki logging system
To configure log forwarding to Loki, you must create a ClusterLogForwarder custom resource (CR) with an output to Loki, and a pipeline that uses the output. The output to Loki can use the HTTP (insecure) or HTTPS (secure HTTP) connection.
Prerequisites
- You have installed Red Hat OpenShift Logging Operator.
- You have administrator access to OpenShift Container Platform.
-
You have installed OpenShift CLI (
oc). -
You have a Loki logging system running at the URL specified in the
urlfield.
Procedure
Create or edit a YAML file that defines the
ClusterLogForwarderCR object:apiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <log_forwarder_name> namespace: <log_forwarder_namespace> spec: serviceAccount: name: <service_account_name> 1 outputs: - name: loki-output 2 type: loki 3 loki: authentication: 4 username: key: username secretName: to-loki-secret password: key: password secretName: to-loki-secret token: from: secret secret: key: ca-bundle.crt name: to-loki-secret labelKeys: 5 - <label_keys> tenantKey: '{.kubernetes.namespace_name||"application"}' 6 url: https://loki.secure.com:3100 7 pipelines: - name: my-pipeline inputRefs: 8 - application - audit outputRefs: 9 - loki-output- 1
- Specify the name of your service account.
- 2
- Specify a name for the output.
- 3
- Specify the type as
loki. - 4
- Specify the authentication information.
- 5
- Specify which log record keys are mapped to Loki stream labels. If you do not set the
labelKeysfield, theClusterLogForwarderCR uses these default keys:log_type,kubernetes.container_name,kubernetes.namespace_name,kubernetes.pod_name. - 6
- Specify the tenant for the logs. The value can be a combination of static and dynamic values consisting of field paths separated by
||. Encase dynamic values inside single curly brackets{}. Follow dynamic values with a static fallback value. - 7
- Specify the URL for Loki.
- 8
- Specify the names of inputs defined in the
input.namefield for this pipeline. You can also use the built-in valuesapplication,infrastructure,audit. - 9
- Specify the names of outputs defined in the
outputs.namefield for this pipeline.
NoteBecause Loki requires log streams to be correctly ordered by timestamp,
labelKeysalways includes thekubernetes_hostlabel set, even if you do not specify it. This inclusion ensures that each stream originates from a single host, which prevents timestamps from becoming disordered due to clock differences on different hosts.Apply the
ClusterLogForwarderCR object by running the following command:$ oc apply -f <filename>.yaml
1.12. Forwarding logs to a Kafka broker
To configure log forwarding to an external Kafka instance, you must create a ClusterLogForwarder custom resource (CR) with an output to that instance, and a pipeline that uses the output. You can include a specific Kafka topic in the output or use the default. The Kafka output can use a TCP (insecure) or TLS (secure TCP) connection.
Procedure
Create or edit a YAML file that defines the
ClusterLogForwarderCR object:apiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <log_forwarder_name> namespace: <log_forwarder_namespace> spec: serviceAccount: name: <service_account_name> 1 outputs: - name: kafka-output 2 type: kafka 3 kafka: authentication: sasl: username: key: <key> secretName: kafka-secret 4 password: key: <key> secretName: kafka-secret mechanism: <sasl_mechanism> 5 url: tls://kafka.example.devlab.com:9093/app-topic 6 brokers: 7 - tls://kafka-broker1.example.com:9093 - tls://kafka-broker2.example.com:9093 topic: 8 pipelines: - name: app-topic inputRefs: - application outputRefs: - kafka-output- 1
- Specify the name of your service account.
- 2
- Specify a name for the output.
- 3
- Specify the
kafkatype. - 4
- If you use a
tlsprefix in the URL, you must specify the name of the secret required by the endpoint for TLS communication. - 5
- Specify the Simple Authentication and Security Layer (SASL) mechanism to use. For example,
SCRAM-SHA-256,SCRAM-SHA-512, orPLAIN. The default value isPLAIN. - 6
- Optional: Specify the URL and port of the Kafka broker as a valid absolute URL, optionally with a specific topic. You can use the
tcp(insecure) ortls(secure TCP) protocol. If you enable the cluster-wide proxy using the CIDR annotation, the output must be a server name or FQDN, and not an IP address. You must specify either the URL or Kafka brokers. - 7
- Optional: Specify a list of broker endpoints of a Kafka cluster.
- 8
- Specify the target topic. By default, the value for the field is
topic. The topic name can be a combination of static and dynamic values consisting of field paths separated by "||". Encase dynamic values in single curly brackets "{}". Follow dynamic values with a static fallback value. The topic specified here overrides the topic defined in theurlfield.
Apply the
ClusterLogForwarderCR by running the following command:$ oc apply -f <filename>.yaml
1.13. Forwarding to Azure Monitor Logs
You can forward logs to Content from learn.microsoft.com is not included.Azure Monitor Logs. This functionality is provided by the Content from vector.dev is not included.Vector Azure Monitor Logs sink.
Prerequisites
- You have basic familiarity with Azure services.
- You have an Azure account configured for Azure Portal or Azure CLI access.
- You have obtained your Azure Monitor Logs primary or the secondary security key.
- You have determined which log types to forward.
-
You installed the OpenShift CLI (
oc). - You have installed Red Hat OpenShift Logging Operator.
- You have administrator permissions.
Procedure
- Enable log forwarding to Azure Monitor Logs via the HTTP Data Collector API:
Create a secret with your shared key:
apiVersion: v1
kind: Secret
metadata:
name: my-secret
namespace: openshift-logging
type: Opaque
data:
shared_key: <your_shared_key> 1- 1
- Must contain a primary or secondary key for the Content from learn.microsoft.com is not included.Log Analytics workspace making the request.
- To obtain a Content from learn.microsoft.com is not included.shared key, you can use this command in Azure CLI:
Get-AzOperationalInsightsWorkspaceSharedKey -ResourceGroupName "<resource_name>" -Name "<workspace_name>”
-
Create or edit your
ClusterLogForwarderCR using the template matching your log selection.
Forward all logs
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: <log_forwarder_name>
namespace: openshift-logging
spec:
serviceAccount:
name: <service_account_name> 1
outputs:
- name: azure-monitor
type: azureMonitor
azureMonitor:
customerId: my-customer-id 2
logType: my_log_type 3
authentication:
sharedKey:
secretName: my-secret
key: shared_key
pipelines:
- name: app-pipeline
inputRefs:
- application
outputRefs:
- azure-monitor- 1
- The name of your service account.
- 2
- Unique identifier for the Log Analytics workspace. Required field.
- 3
- Record type of the data being submitted. May only contain letters, numbers, and underscores (_), and may not exceed 100 characters. For more information, see Content from learn.microsoft.com is not included.Azure record type in the Microsoft Azure documentation.
1.14. Forwarding application logs from specific projects
You can forward a copy of the application logs from specific projects to an external log aggregator, in addition to, or instead of, using the internal log store. You must also configure the external log aggregator to receive log data from OpenShift Container Platform.
To configure forwarding application logs from a project, you must create a ClusterLogForwarder custom resource (CR) with at least one input from a project, optional outputs for other log aggregators, and pipelines that use those inputs and outputs.
Prerequisites
- You must have a logging server that is configured to receive the logging data using the specified protocol or format.
Procedure
Create or edit a YAML file that defines the
ClusterLogForwarderCR:Example
ClusterLogForwarderCRapiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <log_forwarder_name> namespace: <log_forwarder_namespace> spec: serviceAccount: name: <service_account_name> outputs: - name: <output_name> type: <output_type> inputs: - name: my-app-logs 1 type: application 2 application: includes: 3 - namespace: my-project filters: - name: my-project-labels type: openshiftLabels openshiftLabels: 4 project: my-project - name: cluster-labels type: openshiftLabels openshiftLabels: clusterId: C1234 pipelines: - name: <pipeline_name> 5 inputRefs: - my-app-logs outputRefs: - <output_name> filterRefs: - my-project-labels - cluster-labels- 1
- Specify the name for the input.
- 2
- Specify the type as
applicationto collect logs from applications. - 3
- Specify the set of namespaces and containers to include when collecting logs.
- 4
- Specify the labels to be applied to log records passing through this pipeline. These labels appear in the
openshift.labelsmap in the log record. - 5
- Specify a name for the pipeline.
Apply the
ClusterLogForwarderCR by running the following command:$ oc apply -f <filename>.yaml
1.15. Forwarding application logs from specific pods
As a cluster administrator, you can use Kubernetes pod labels to gather log data from specific pods and forward it to a log collector.
Suppose that you have an application composed of pods running alongside other pods in various namespaces. If those pods have labels that identify the application, you can gather and output their log data to a specific log collector.
To specify the pod labels, you use one or more matchLabels key-value pairs. If you specify multiple key-value pairs, the pods must match all of them to be selected.
Procedure
Create or edit a YAML file that defines the
ClusterLogForwarderCR object. In the file, specify the pod labels using simple equality-based selectors underinputs[].name.application.selector.matchLabels, as shown in the following example.apiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <log_forwarder_name> namespace: <log_forwarder_namespace> spec: serviceAccount: name: <service_account_name> 1 outputs: - <output_name> # ... inputs: - name: exampleAppLogData 2 type: application 3 application: includes: 4 - namespace: app1 - namespace: app2 selector: matchLabels: 5 environment: production app: nginx pipelines: - inputRefs: - exampleAppLogData outputRefs: # ...- 1
- Specify the service account name.
- 2
- Specify a name for the input.
- 3
- Specify the type as
applicationto collect logs from applications. - 4
- Specify the set of namespaces to include when collecting logs.
- 5
- Specify the key-value pairs of pod labels whose log data you want to gather. You must specify both a key and value, not just a key. To be selected, the pods must match all the key-value pairs.
Optional: You can send log data from additional applications that have different pod labels to the same pipeline.
-
For each unique combination of pod labels, create an additional
inputs[].namesection similar to the one shown. -
Update the
selectorsto match the pod labels of this application. Add the new
inputs[].namevalue toinputRefs. For example:- inputRefs: [ myAppLogData, myOtherAppLogData ]
-
For each unique combination of pod labels, create an additional
Create the CR object:
$ oc create -f <file-name>.yaml
Additional resources
1.15.1. Configuring content filters to drop unwanted log records
Collecting all cluster logs produces a large amount of data, which can be expensive to move and store. To reduce volume, you can configure the drop filter to exclude unwanted log records before forwarding. The log collector evaluates log streams against the filter and drops records that match specified conditions.
The drop filter uses the test field to define one or more conditions for evaluating log records. The filter applies the following rules to check whether to drop a record:
- A test passes if all its specified conditions evaluate to true.
- If a test passes, the filter drops the log record.
-
If you define several tests in the
dropfilter configuration, the filter drops the log record if any of the tests pass. - If there is an error evaluating a condition, for example, the referenced field is missing, that condition evaluates to false.
Prerequisites
- You have installed the Red Hat OpenShift Logging Operator.
- You have administrator permissions.
-
You have created a
ClusterLogForwardercustom resource (CR). -
You have installed the OpenShift CLI (
oc).
Procedure
Extract the existing
ClusterLogForwarderconfiguration and save it as a local file.$ oc get clusterlogforwarder <name> -n <namespace> -o yaml > <filename>.yaml
Where:
-
<name>is the name of theClusterLogForwarderinstance you want to configure. -
<namespace>is the namespace where you created theClusterLogForwarderinstance, for exampleopenshift-logging. -
<filename>is the name of the local file where you save the configuration.
-
Add a configuration to drop unwanted log records to the
filtersspec in theClusterLogForwarderCR.Example
ClusterLogForwarderCRapiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: instance namespace: openshift-logging spec: # ... filters: - name: drop-filter type: drop 1 drop: 2 - test: 3 - field: .kubernetes.labels."app.version-1.2/beta" 4 matches: .+ 5 - field: .kubernetes.pod_name notMatches: "my-pod" 6 pipelines: - name: my-pipeline 7 filterRefs: - drop-filter # ...- 1
- Specify the type of filter. The
dropfilter drops log records that match the filter configuration. - 2
- Specify configuration options for the
dropfilter. - 3
- Specify conditions for tests to evaluate whether the filter drops a log record.
- 4
- Specify dot-delimited paths to fields in log records.
-
Each path segment can contain alphanumeric characters and underscores,
a-z,A-Z,0-9,_, for example,.kubernetes.namespace_name. -
If segments contain different characters, the segment must be in quotes, for example,
.kubernetes.labels."app.version-1.2/beta". -
You can include several field paths in a single
testconfiguration, but they must all evaluate to true for the test to pass and thedropfilter to apply.
-
Each path segment can contain alphanumeric characters and underscores,
- 5
- Specify a regular expression. If log records match this regular expression, they are dropped.
- 6
- Specify a regular expression. If log records do not match this regular expression, they are dropped.
- 7
- Specify the pipeline that uses the
dropfilter.
NoteYou can set either the
matchesornotMatchescondition for a singlefieldpath, but not both.Example configuration that keeps only high-priority log records
# ... filters: - name: important type: drop drop: - test: - field: .message notMatches: "(?i)critical|error" - field: .level matches: "info|warning" # ...Example configuration with several tests
# ... filters: - name: important type: drop drop: - test: 1 - field: .kubernetes.namespace_name matches: "openshift.*" - test: 2 - field: .log_type matches: "application" - field: .kubernetes.pod_name notMatches: "my-pod" # ...
Apply the
ClusterLogForwarderCR by running the following command:$ oc apply -f <filename>.yaml
1.15.2. API audit filter overview
OpenShift API servers generate audit events for every API call. These events include details about the request, the response, and the identity of the requester. This can lead to large volumes of data.
The API audit filter helps manage the audit trail by using rules to exclude non-essential events and to reduce the event size. Rules are checked in order, and checking stops at the first match. The amount of data in an event depends on the value of the level field:
-
None: The event is dropped. -
Metadata: The event includes audit metadata and excludes request and response bodies. -
Request: The event includes audit metadata and the request body, and excludes the response body. -
RequestResponse: The event includes all data: metadata, request body and response body. The response body can be very large. For example,oc get pods -Agenerates a response body containing the YAML description of every pod in the cluster.
You can only use the API audit filter feature if the Vector collector is set up in your logging deployment.
The ClusterLogForwarder custom resource (CR) uses the same format as the standard Content from kubernetes.io is not included.Kubernetes audit policy. The ClusterLogForwarder CR provides the following additional functions:
- Wildcards
-
Names of users, groups, namespaces, and resources can have a leading or trailing
*asterisk character. For example, theopenshift-\*namespace matchesopenshift-apiserveroropenshift-authenticationnamespaces. The\*/statusresource matchesPod/statusorDeployment/statusresources. - Default Rules
Events that do not match any rule in the policy are filtered as follows:
-
Read-only system events such as
get,list, andwatchare dropped. - Service account write events that occur within the same namespace as the service account are dropped.
- All other events are forwarded, subject to any configured rate limits.
To disable these defaults, either end your rules list with a rule that has only a
levelfield or add an empty rule.-
Read-only system events such as
- Omit Response Codes
-
A list of integer status codes to omit. You can drop events based on the HTTP status code in the response by using the
OmitResponseCodesfield, which lists HTTP status codes for which no events are created. The default value is[404, 409, 422, 429]. If the value is an empty list,[], no status codes are omitted.
The ClusterLogForwarder CR audit policy acts in addition to the OpenShift Container Platform audit policy. The ClusterLogForwarder CR audit filter changes what the log collector forwards, and provides the ability to filter by verb, user, group, namespace, or resource. You can create multiple filters to send different summaries of the same audit stream to different places. For example, you can send a detailed stream to the local cluster log store, and a less detailed stream to a remote site.
-
You must have the
collect-audit-logscluster role to collect the audit logs. - The following example provided is intended to illustrate the range of rules possible in an audit policy and is not a recommended configuration.
Example audit policy
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
serviceAccount:
name: example-service-account
pipelines:
- name: my-pipeline
inputRefs:
- audit 1
filterRefs:
- my-policy 2
outputRefs:
- my-output
filters:
- name: my-policy
type: kubeAPIAudit
kubeAPIAudit:
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
- "RequestReceived"
rules:
# Log pod changes at RequestResponse level
- level: RequestResponse
resources:
- group: ""
resources: ["pods"]
# Log "pods/log", "pods/status" at Metadata level
- level: Metadata
resources:
- group: ""
resources: ["pods/log", "pods/status"]
# Don't log requests to a configmap called "controller-leader"
- level: None
resources:
- group: ""
resources: ["configmaps"]
resourceNames: ["controller-leader"]
# Don't log watch requests by the "system:kube-proxy" on endpoints or services
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core API group
resources: ["endpoints", "services"]
# Don't log authenticated requests to certain non-resource URL paths.
- level: None
userGroups: ["system:authenticated"]
nonResourceURLs:
- "/api*" # Wildcard matching.
- "/version"
# Log the request body of configmap changes in kube-system.
- level: Request
resources:
- group: "" # core API group
resources: ["configmaps"]
# This rule only applies to resources in the "kube-system" namespace.
# The empty string "" can be used to select non-namespaced resources.
namespaces: ["kube-system"]
# Log configmap and secret changes in all other namespaces at the Metadata level.
- level: Metadata
resources:
- group: "" # core API group
resources: ["secrets", "configmaps"]
# Log all other resources in core and extensions at the Request level.
- level: Request
resources:
- group: "" # core API group
- group: "extensions" # Version of group should NOT be included.
# A catch-all rule to log all other requests at the Metadata level.
- level: Metadata1.15.3. Filtering application logs at input by including the label expressions or a matching label key and values
You can include the application logs based on the label expressions or a matching label key and its values by using the input selector.
Procedure
Add a configuration for a filter to the
inputspec in theClusterLogForwarderCR.The following example shows how to configure the
ClusterLogForwarderCR to include logs based on label expressions or matched label key/values:Example
ClusterLogForwarderCRapiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder # ... spec: serviceAccount: name: <service_account_name> inputs: - name: mylogs application: selector: matchExpressions: - key: env 1 operator: In 2 values: ["prod", "qa"] 3 - key: zone operator: NotIn values: ["east", "west"] matchLabels: 4 app: one name: app1 type: application # ...Apply the
ClusterLogForwarderCR by running the following command:$ oc apply -f <filename>.yaml
1.15.4. Configuring content filters to prune log records
If you configure the prune filter, the log collector evaluates log streams against the filters before forwarding. The collector prunes log records by removing low value fields such as pod annotations.
Prerequisites
- You have installed the Red Hat OpenShift Logging Operator.
- You have administrator permissions.
-
You have created a
ClusterLogForwardercustom resource (CR). -
You have installed the OpenShift CLI (
oc).
Procedure
Extract the existing
ClusterLogForwarderconfiguration and save it as a local file.$ oc get clusterlogforwarder <name> -n <namespace> -o yaml > <filename>.yaml
Where:
-
<name>is the name of theClusterLogForwarderinstance you want to configure. -
<namespace>is the namespace where you created theClusterLogForwarderinstance, for exampleopenshift-logging. -
<filename>is the name of the local file where you save the configuration.
-
Add a configuration to prune log records to the
filtersspec in theClusterLogForwarderCR.ImportantIf you specify both
inandnotInparameters, thenotInarray takes precedence overinduring pruning. After records are pruned by using thenotInarray, they are then pruned by using theinarray.Example
ClusterLogForwarderCRapiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: instance namespace: openshift-logging spec: serviceAccount: name: my-account filters: - name: prune-filter type: prune 1 prune: 2 in: [.kubernetes.annotations, .kubernetes.namespace_id] 3 notIn: [.kubernetes,.log_type,.message,."@timestamp",.log_source] 4 pipelines: - name: my-pipeline 5 filterRefs: ["prune-filter"] # ...- 1
- Specify the type of filter. The
prunefilter prunes log records by configured fields. - 2
- Specify configuration options for the
prunefilter.-
The
inandnotInfields are arrays of dot-delimited paths to fields in log records. -
Each path segment can contain alpha-numeric characters and underscores,
a-z,A-Z,0-9,_, for example,.kubernetes.namespace_name. -
If segments contain different characters, the segment must be in quotes, for example,
.kubernetes.labels."app.version-1.2/beta".
-
The
- 3
- Optional: Specify fields to remove from the log record. The log collector keeps all other fields.
- 4
- Optional: Specify fields to keep in the log record. The log collector removes all other fields.
- 5
- Specify the pipeline that the
prunefilter is applied to.Important-
The filters cannot remove the
.log_type,.log_source,.messagefields from the log records. You must include them in thenotInfield. -
If you use the
googleCloudLoggingoutput, you must include.hostnamein thenotInfield.
-
The filters cannot remove the
Apply the
ClusterLogForwarderCR by running the following command:$ oc apply -f <filename>.yaml
1.16. Filtering the audit and infrastructure log inputs by source
You can define the list of audit and infrastructure sources to collect the logs by using the input selector.
Procedure
Add a configuration to define the
auditandinfrastructuresources in theClusterLogForwarderCR.The following example shows how to configure the
ClusterLogForwarderCR to defineauditandinfrastructuresources:Example
ClusterLogForwarderCRapiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder # ... spec: serviceAccount: name: <service_account_name> inputs: - name: mylogs1 type: infrastructure infrastructure: sources: 1 - node - name: mylogs2 type: audit audit: sources: 2 - kubeAPI - openshiftAPI - ovn # ...- 1
- Specifies the list of infrastructure sources to collect. The valid sources include:
-
node: Journal log from the node -
container: Logs from the workloads deployed in the namespaces
-
- 2
- Specifies the list of audit sources to collect. The valid sources include:
-
kubeAPI: Logs from the Kubernetes API servers -
openshiftAPI: Logs from the OpenShift API servers -
auditd: Logs from a node auditd service -
ovn: Logs from an open virtual network service
-
Apply the
ClusterLogForwarderCR by running the following command:$ oc apply -f <filename>.yaml
1.17. Filtering application logs at input by including or excluding the namespace or container name
You can include or exclude the application logs based on the namespace and container name by using the input selector.
Procedure
Add a configuration to include or exclude the namespace and container names in the
ClusterLogForwarderCR.The following example shows how to configure the
ClusterLogForwarderCR to include or exclude namespaces and container names:Example
ClusterLogForwarderCRapiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder # ... spec: serviceAccount: name: <service_account_name> inputs: - name: mylogs application: includes: - namespace: "my-project" 1 container: "my-container" 2 excludes: - container: "other-container*" 3 namespace: "other-namespace" 4 type: application # ...NoteThe
excludesfield takes precedence over theincludesfield.Apply the
ClusterLogForwarderCR by running the following command:$ oc apply -f <filename>.yaml
Chapter 2. Configuring the logging collector
Logging for Red Hat OpenShift collects operations and application logs from your cluster and enriches the data with Kubernetes pod and project metadata. All supported modifications to the log collector can be performed though the spec.collection stanza in the ClusterLogForwarder custom resource (CR).
2.1. Creating a LogFileMetricExporter resource
You must manually create a LogFileMetricExporter custom resource (CR) to generate metrics from the logs produced by running containers, because it is not deployed with the collector by default.
If you do not create the LogFileMetricExporter CR, you might see a No datapoints found message in the OpenShift Container Platform web console dashboard for the Produced Logs field.
Prerequisites
- You have administrator permissions.
- You have installed the Red Hat OpenShift Logging Operator.
-
You have installed the OpenShift CLI (
oc).
Procedure
Create a
LogFileMetricExporterCR as a YAML file:Example
LogFileMetricExporterCRapiVersion: logging.openshift.io/v1alpha1 kind: LogFileMetricExporter metadata: name: instance namespace: openshift-logging spec: nodeSelector: {} 1 resources: 2 limits: cpu: 500m memory: 256Mi requests: cpu: 200m memory: 128Mi tolerations: [] 3 # ...Apply the
LogFileMetricExporterCR by running the following command:$ oc apply -f <filename>.yaml
Verification
Verify that the
logfilesmetricexporterpods are running in the namespace where you have created theLogFileMetricExporterCR, by running the following command and observing the output:$ oc get pods -l app.kubernetes.io/component=logfilesmetricexporter -n openshift-logging
Example output
NAME READY STATUS RESTARTS AGE logfilesmetricexporter-9qbjj 1/1 Running 0 2m46s logfilesmetricexporter-cbc4v 1/1 Running 0 2m46s
A
logfilesmetricexporterpod runs concurrently with acollectorpod on each node.
2.2. Configure log collector CPU and memory limits
You can adjust both the CPU and memory limits for the log collector by editing the ClusterLogForwarder custom resource (CR).
Procedure
Edit the
ClusterLogForwarderCR in theopenshift-loggingproject:$ oc -n openshift-logging edit clusterlogforwarder.observability.openshift.io <clf_name>
apiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <clf_name> 1 namespace: openshift-logging spec: collector: resources: 2 requests: memory: 736Mi limits: cpu: 100m memory: 736Mi # ...
2.3. Configuring input receivers
The Red Hat OpenShift Logging Operator deploys a service for each configured input receiver so that clients can write to the collector. This service exposes the port specified for the input receiver. For log forwarder ClusterLogForwarder CR deployments, the service name is in the <clusterlogforwarder_resource_name>-<input_name> format.
2.3.1. Configuring the collector to receive audit logs as an HTTP server
You can configure your log collector to listen for HTTP connections to only receive audit logs by specifying http as a receiver input in the ClusterLogForwarder custom resource (CR).
HTTP receiver input is only supported for the following scenarios:
- Logging is installed on hosted control planes.
When logs originate from a Red Hat-supported product that is installed on the same cluster as the Red Hat OpenShift Logging Operator. For example:
- OpenShift Virtualization
Prerequisites
- You have administrator permissions.
-
You have installed the OpenShift CLI (
oc). - You have installed the Red Hat OpenShift Logging Operator.
Procedure
Modify the
ClusterLogForwarderCR to add configuration for thehttpreceiver input:Example
ClusterLogForwarderCRapiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <clusterlogforwarder_name> 1 namespace: <namespace> # ... spec: serviceAccount: name: <service_account_name> inputs: - name: http-receiver 2 type: receiver receiver: type: http 3 port: 8443 4 http: format: kubeAPIAudit 5 outputs: - name: <output_name> type: http http: url: <url> pipelines: 6 - name: http-pipeline inputRefs: - http-receiver outputRefs: - <output_name> # ...
- 1
- Specify a name for the
ClusterLogForwarderCR. - 2
- Specify a name for your input receiver.
- 3
- Specify the input receiver type as
http. - 4
- Optional: Specify the port that the input receiver listens on. This must be a value between
1024and65535. The default value is8443if this is not specified. - 5
- Currently, only the
kube-apiserverwebhook format is supported forhttpinput receivers. - 6
- Configure a pipeline for your input receiver.
Apply the changes to the
ClusterLogForwarderCR by running the following command:$ oc apply -f <filename>.yaml
Verify that the collector is listening on the service that has a name in the
<clusterlogforwarder_resource_name>-<input_name>format by running the following command:$ oc get svc
Example output
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE collector ClusterIP 172.30.85.239 <none> 24231/TCP 3m6s collector-http-receiver ClusterIP 172.30.205.160 <none> 8443/TCP 3m6s
In the example, the service name is
collector-http-receiver.
Verification
Extract the certificate authority (CA) certificate file by running the following command:
$ oc extract cm/openshift-service-ca.crt -n <namespace>
NoteIf the CA in the cluster where the collectors are running changes, you must extract the CA certificate file again.
As an example, use the
curlcommand to send logs by running the following command:$ curl --cacert <openshift_service_ca.crt> https://collector-http-receiver.<namespace>.svc:8443 -XPOST -d '{"<prefix>":"<message>"}'Replace <openshift_service_ca.crt> with the extracted CA certificate file.
2.3.2. Configuring the collector to listen for connections as a syslog server
You can configure your log collector to collect journal format infrastructure logs by specifying syslog as a receiver input in the ClusterLogForwarder custom resource (CR).
Syslog receiver input is only supported for the following scenarios:
- Logging is installed on hosted control planes.
When logs originate from a Red Hat-supported product that is installed on the same cluster as the Red Hat OpenShift Logging Operator. For example:
- Red Hat OpenStack Services on OpenShift (RHOSO)
- OpenShift Virtualization
Prerequisites
- You have administrator permissions.
-
You have installed the OpenShift CLI (
oc). - You have installed the Red Hat OpenShift Logging Operator.
Procedure
Grant the
collect-infrastructure-logscluster role to the service account by running the following command:Example binding command
$ oc adm policy add-cluster-role-to-user collect-infrastructure-logs -z logcollector
Modify the
ClusterLogForwarderCR to add configuration for thesyslogreceiver input:Example
ClusterLogForwarderCRapiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: <clusterlogforwarder_name> 1 namespace: <namespace> # ... spec: serviceAccount: name: <service_account_name> 2 inputs: - name: syslog-receiver 3 type: receiver receiver: type: syslog 4 port: 10514 5 outputs: - name: <output_name> lokiStack: authentication: token: from: serviceAccount target: name: logging-loki namespace: openshift-logging tls: 6 ca: key: service-ca.crt configMapName: openshift-service-ca.crt type: lokiStack # ... pipelines: 7 - name: syslog-pipeline inputRefs: - syslog-receiver outputRefs: - <output_name> # ...
- 1 2
- Use the service account that you granted the
collect-infrastructure-logspermission in the previous step. - 3
- Specify a name for your input receiver.
- 4
- Specify the input receiver type as
syslog. - 5
- Optional: Specify the port that the input receiver listens on. This must be a value between
1024and65535. - 6
- If TLS configuration is not set, the default certificates will be used. For more information, run the command
oc explain clusterlogforwarders.spec.inputs.receiver.tls. - 7
- Configure a pipeline for your input receiver.
Apply the changes to the
ClusterLogForwarderCR by running the following command:$ oc apply -f <filename>.yaml
Verify that the collector is listening on the service that has a name in the
<clusterlogforwarder_resource_name>-<input_name>format by running the following command:$ oc get svc
Example output
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE collector ClusterIP 172.30.85.239 <none> 24231/TCP 33m collector-syslog-receiver ClusterIP 172.30.216.142 <none> 10514/TCP 2m20s
In this example output, the service name is
collector-syslog-receiver.
Verification
Extract the certificate authority (CA) certificate file by running the following command:
$ oc extract cm/openshift-service-ca.crt -n <namespace>
NoteIf the CA in the cluster where the collectors are running changes, you must extract the CA certificate file again.
As an example, use the
curlcommand to send logs by running the following command:$ curl --cacert <openshift_service_ca.crt> collector-syslog-receiver.<namespace>.svc:10514 “test message”
Replace <openshift_service_ca.crt> with the extracted CA certificate file.
Chapter 3. Configuring the log store
You can configure a LokiStack custom resource (CR) to store application, audit, and infrastructure-related logs.
Loki is a horizontally scalable, highly available, multi-tenant log aggregation system offered as a GA log store for logging for Red Hat OpenShift that can be visualized with the OpenShift Observability UI. The Loki configuration provided by OpenShift Logging is a short-term log store designed to enable users to perform fast troubleshooting with the collected logs. For that purpose, the logging for Red Hat OpenShift configuration of Loki has short-term storage, and is optimized for very recent queries.
For long-term storage or queries over a long time period, users should look to log stores external to their cluster. Loki sizing is only tested and supported for short term storage, for a maximum of 30 days.
3.1. Loki deployment sizing
Sizing for Loki follows the format of 1x.<size> where the value 1x is number of instances and <size> specifies performance capabilities.
The 1x.pico configuration defines a single Loki deployment with minimal resource and limit requirements, offering high availability (HA) support for all Loki components. This configuration is suited for deployments that do not require a single replication factor or auto-compaction.
Disk requests are similar across size configurations, allowing customers to test different sizes to determine the best fit for their deployment needs.
It is not possible to change the number 1x for the deployment size.
Table 3.1. Loki sizing
| 1x.demo | 1x.pico [6.1+ only] | 1x.extra-small | 1x.small | 1x.medium | |
|---|---|---|---|---|---|
| Data transfer | Demo use only | 50GB/day | 100GB/day | 500GB/day | 2TB/day |
| Queries per second (QPS) | Demo use only | 1-25 QPS at 200ms | 1-25 QPS at 200ms | 25-50 QPS at 200ms | 25-75 QPS at 200ms |
| Replication factor | None | 2 | 2 | 2 | 2 |
| Total CPU requests | None | 7 vCPUs | 14 vCPUs | 34 vCPUs | 54 vCPUs |
| Total CPU requests if using the ruler | None | 8 vCPUs | 16 vCPUs | 42 vCPUs | 70 vCPUs |
| Total memory requests | None | 17Gi | 31Gi | 67Gi | 139Gi |
| Total memory requests if using the ruler | None | 18Gi | 35Gi | 83Gi | 171Gi |
| Total disk requests | 40Gi | 590Gi | 430Gi | 430Gi | 590Gi |
| Total disk requests if using the ruler | 60Gi | 910Gi | 750Gi | 750Gi | 910Gi |
3.2. Loki object storage
The Loki Operator supports Content from aws.amazon.com is not included.AWS S3, as well as other S3 compatible object stores such as Content from min.io is not included.Minio and This content is not included.OpenShift Data Foundation. Content from azure.microsoft.com is not included.Azure, Content from cloud.google.com is not included.GCS, and Content from docs.openstack.org is not included.Swift are also supported.
The recommended nomenclature for Loki storage is logging-loki-<your_storage_provider>.
The following table shows the type values within the LokiStack custom resource (CR) for each storage provider. For more information, see the section on your storage provider.
Table 3.2. Secret type quick reference
| Storage provider | Secret type value |
|---|---|
| AWS | s3 |
| Azure | azure |
| Google Cloud | gcs |
| Minio | s3 |
| OpenShift Data Foundation | s3 |
| Swift | swift |
3.2.1. AWS storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc). - You created a Content from docs.aws.amazon.com is not included.bucket on AWS.
- You created an Content from docs.aws.amazon.com is not included.AWS IAM Policy and IAM User.
Procedure
Create an object storage secret with the name
logging-loki-awsby running the following command:$ oc create secret generic logging-loki-aws \ --from-literal=bucketnames="<bucket_name>" \ --from-literal=endpoint="<aws_bucket_endpoint>" \ --from-literal=access_key_id="<aws_access_key_id>" \ --from-literal=access_key_secret="<aws_access_key_secret>" \ --from-literal=region="<aws_region_of_your_bucket>"
3.2.1.1. AWS storage for STS enabled clusters
If your cluster has STS enabled, the Cloud Credential Operator (CCO) supports short-term authentication using AWS tokens.
You can create the Loki object storage secret manually by running the following command:
$ oc -n openshift-logging create secret generic "logging-loki-aws" \
--from-literal=bucketnames="<s3_bucket_name>" \
--from-literal=region="<bucket_region>" \
--from-literal=audience="<oidc_audience>" 1- 1
- Optional annotation, default value is
openshift.
3.2.2. Azure storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc). - You created a Content from docs.microsoft.com is not included.bucket on Azure.
Procedure
Create an object storage secret with the name
logging-loki-azureby running the following command:$ oc create secret generic logging-loki-azure \ --from-literal=container="<azure_container_name>" \ --from-literal=environment="<azure_environment>" \ 1 --from-literal=account_name="<azure_account_name>" \ --from-literal=account_key="<azure_account_key>"- 1
- Supported environment values are
AzureGlobal,AzureChinaCloud,AzureGermanCloud, orAzureUSGovernment.
3.2.2.1. Azure storage for Microsoft Entra Workload ID enabled clusters
If your cluster has Microsoft Entra Workload ID enabled, the Cloud Credential Operator (CCO) supports short-term authentication using Workload ID.
You can create the Loki object storage secret manually by running the following command:
$ oc -n openshift-logging create secret generic logging-loki-azure \ --from-literal=environment="<azure_environment>" \ --from-literal=account_name="<storage_account_name>" \ --from-literal=container="<container_name>"
3.2.3. Google Cloud Platform storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc). - You created a Content from cloud.google.com is not included.project on Google Cloud Platform (GCP).
- You created a Content from cloud.google.com is not included.bucket in the same project.
- You created a Content from cloud.google.com is not included.service account in the same project for GCP authentication.
Procedure
-
Copy the service account credentials received from GCP into a file called
key.json. Create an object storage secret with the name
logging-loki-gcsby running the following command:$ oc create secret generic logging-loki-gcs \ --from-literal=bucketname="<bucket_name>" \ --from-file=key.json="<path/to/key.json>"
3.2.4. Minio storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc). - You have Content from operator.min.io is not included.Minio deployed on your cluster.
- You created a Content from docs.min.io is not included.bucket on Minio.
Procedure
Create an object storage secret with the name
logging-loki-minioby running the following command:$ oc create secret generic logging-loki-minio \ --from-literal=bucketnames="<bucket_name>" \ --from-literal=endpoint="<minio_bucket_endpoint>" \ --from-literal=access_key_id="<minio_access_key_id>" \ --from-literal=access_key_secret="<minio_access_key_secret>"
3.2.5. OpenShift Data Foundation storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc). - You deployed OpenShift Data Foundation.
- You configured your OpenShift Data Foundation cluster for object storage.
Procedure
Create an
ObjectBucketClaimcustom resource in theopenshift-loggingnamespace:apiVersion: objectbucket.io/v1alpha1 kind: ObjectBucketClaim metadata: name: loki-bucket-odf namespace: openshift-logging spec: generateBucketName: loki-bucket-odf storageClassName: openshift-storage.noobaa.io
Get bucket properties from the associated
ConfigMapobject by running the following command:BUCKET_HOST=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_HOST}') BUCKET_NAME=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_NAME}') BUCKET_PORT=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_PORT}')Get bucket access key from the associated secret by running the following command:
ACCESS_KEY_ID=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d) SECRET_ACCESS_KEY=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)Create an object storage secret with the name
logging-loki-odfby running the following command:$ oc create -n openshift-logging secret generic logging-loki-odf \ --from-literal=access_key_id="<access_key_id>" \ --from-literal=access_key_secret="<secret_access_key>" \ --from-literal=bucketnames="<bucket_name>" \ --from-literal=endpoint="https://<bucket_host>:<bucket_port>"
3.2.6. Swift storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc). - You created a Content from docs.openstack.org is not included.bucket on Swift.
Procedure
Create an object storage secret with the name
logging-loki-swiftby running the following command:$ oc create secret generic logging-loki-swift \ --from-literal=auth_url="<swift_auth_url>" \ --from-literal=username="<swift_usernameclaim>" \ --from-literal=user_domain_name="<swift_user_domain_name>" \ --from-literal=user_domain_id="<swift_user_domain_id>" \ --from-literal=user_id="<swift_user_id>" \ --from-literal=password="<swift_password>" \ --from-literal=domain_id="<swift_domain_id>" \ --from-literal=domain_name="<swift_domain_name>" \ --from-literal=container_name="<swift_container_name>"
You can optionally provide project-specific data, region, or both by running the following command:
$ oc create secret generic logging-loki-swift \ --from-literal=auth_url="<swift_auth_url>" \ --from-literal=username="<swift_usernameclaim>" \ --from-literal=user_domain_name="<swift_user_domain_name>" \ --from-literal=user_domain_id="<swift_user_domain_id>" \ --from-literal=user_id="<swift_user_id>" \ --from-literal=password="<swift_password>" \ --from-literal=domain_id="<swift_domain_id>" \ --from-literal=domain_name="<swift_domain_name>" \ --from-literal=container_name="<swift_container_name>" \ --from-literal=project_id="<swift_project_id>" \ --from-literal=project_name="<swift_project_name>" \ --from-literal=project_domain_id="<swift_project_domain_id>" \ --from-literal=project_domain_name="<swift_project_domain_name>" \ --from-literal=region="<swift_region>"
3.2.7. Deploying a Loki log store on a cluster that uses short-term credentials
For some storage providers, you can use the Cloud Credential Operator utility (ccoctl) during installation to implement short-term credentials. These credentials are created and managed outside the OpenShift Container Platform cluster. For more information, see Manual mode with short-term credentials for components.
Short-term credential authentication must be configured during a new installation of Loki Operator on a cluster that uses this credentials strategy. You cannot configure an existing cluster that uses a different credentials strategy to use this feature.
3.2.7.1. Authenticating with workload identity federation to access cloud-based log stores
You can use workload identity federation with short-lived tokens to authenticate to cloud-based log stores. With workload identity federation, you do not have to store long-lived credentials in your cluster, reducing the risk of credential leaks and simplifying secret management.
Prerequisites
- You have administrator permissions.
Procedure
Use one of the following options to enable authentication:
-
If you used the OpenShift Container Platform web console to install the Loki Operator, the system automatically detects clusters that use short-lived tokens. You are prompted to create roles and supply the data required for the Loki Operator to create a
CredentialsRequestobject, which populates a secret. If you used the OpenShift CLI (
oc) to install the Loki Operator, you must manually create aSubscriptionobject. Use the appropriate template for your storage provider, as shown in the following samples. This authentication method supports only the providers listed in the examples.Microsoft Azure sample subscription
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: loki-operator namespace: openshift-operators-redhat spec: channel: "stable-6.0" installPlanApproval: Manual name: loki-operator source: redhat-operators sourceNamespace: openshift-marketplace config: env: - name: CLIENTID value: <your_client_id> - name: TENANTID value: <your_tenant_id> - name: SUBSCRIPTIONID value: <your_subscription_id> - name: REGION value: <your_region>Amazon Web Services (AWS) sample subscription
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: loki-operator namespace: openshift-operators-redhat spec: channel: "stable-6.0" installPlanApproval: Manual name: loki-operator source: redhat-operators sourceNamespace: openshift-marketplace config: env: - name: ROLEARN value: <role_ARN>
-
If you used the OpenShift Container Platform web console to install the Loki Operator, the system automatically detects clusters that use short-lived tokens. You are prompted to create roles and supply the data required for the Loki Operator to create a
3.2.7.2. Creating a LokiStack custom resource by using the web console
You can create a LokiStack custom resource (CR) by using the OpenShift Container Platform web console.
Prerequisites
- You have administrator permissions.
- You have access to the OpenShift Container Platform web console.
- You installed the Loki Operator.
Procedure
- Go to the Operators → Installed Operators page. Click the All instances tab.
- From the Create new drop-down list, select LokiStack.
Select YAML view, and then use the following template to create a
LokiStackCR:apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki 1 namespace: openshift-logging spec: size: 1x.small 2 storage: schemas: - effectiveDate: '2023-10-15' version: v13 secret: name: logging-loki-s3 3 type: s3 4 credentialMode: 5 storageClassName: <storage_class_name> 6 tenants: mode: openshift-logging
- 1
- Use the name
logging-loki. - 2
- Specify the deployment size. In the logging 5.8 and later versions, the supported size options for production instances of Loki are
1x.extra-small,1x.small, or1x.medium. - 3
- Specify the secret used for your log storage.
- 4
- Specify the corresponding storage type.
- 5
- Optional field, logging 5.9 and later. Supported user configured values are as follows:
staticis the default authentication mode available for all supported object storage types using credentials stored in a Secret.tokenfor short-lived tokens retrieved from a credential source. In this mode the static configuration does not contain credentials needed for the object storage. Instead, they are generated during runtime using a service, which allows for shorter-lived credentials and much more granular control. This authentication mode is not supported for all object storage types.token-ccois the default value when Loki is running on managed STS mode and using CCO on STS/WIF clusters. - 6
- Enter the name of a storage class for temporary storage. For best performance, specify a storage class that allocates block storage. Available storage classes for your cluster can be listed by using the
oc get storageclassescommand.
3.2.7.3. Creating a secret for Loki object storage by using the CLI
To configure Loki object storage, you must create a secret. You can do this by using the OpenShift CLI (oc).
Prerequisites
- You have administrator permissions.
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc).
Procedure
Create a secret in the directory that contains your certificate and key files by running the following command:
$ oc create secret generic -n openshift-logging <your_secret_name> \ --from-file=tls.key=<your_key_file> \ --from-file=tls.crt=<your_crt_file> \ --from-file=ca-bundle.crt=<your_bundle_file> \ --from-literal=username=<your_username> \ --from-literal=password=<your_password>
Use generic or opaque secrets for best results.
Verification
Verify that a secret was created by running the following command:
$ oc get secret -n openshift-logging
3.2.8. Fine-grained access for Loki logs
The Red Hat OpenShift Logging Operator does not grant all users access to logs by default. As an administrator, you must configure your users' access unless the Operator was upgraded and prior configurations are in place. Depending on your configuration and need, you can configure fine-grain access to logs using the following:
- Cluster-wide policies
- Namespace scoped policies
- Creation of custom admin groups
As an administrator, you need to create the role bindings and cluster role bindings appropriate for your deployment. The Red Hat OpenShift Logging Operator provides the following cluster roles:
-
cluster-logging-application-viewgrants permission to read application logs. -
cluster-logging-infrastructure-viewgrants permission to read infrastructure logs. -
cluster-logging-audit-viewgrants permission to read audit logs.
If you have upgraded from a prior version, an additional logging-application-logs-reader cluster role and its associated logging-all-authenticated-application-logs-reader cluster role binding provide backward compatibility, allowing any authenticated user read access in their namespaces.
Users with access by namespace must provide a namespace when querying application logs.
3.2.8.1. Cluster-wide access
Cluster role binding resources reference cluster roles, and set permissions cluster-wide.
Example ClusterRoleBinding
kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: logging-all-application-logs-reader roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-logging-application-view 1 subjects: 2 - kind: Group name: system:authenticated apiGroup: rbac.authorization.k8s.io
3.2.8.2. Namespaced access
RoleBinding resources can be used with ClusterRole objects to define the namespace a user or group has access to logs for.
Example RoleBinding
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: allow-read-logs
namespace: log-test-0 1
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-logging-application-view
subjects:
- kind: User
apiGroup: rbac.authorization.k8s.io
name: testuser-0- 1
- Specifies the namespace this
RoleBindingapplies to.
3.2.8.3. Custom admin group access
If you have a large deployment with several users who require broader permissions, you can create a custom group using the adminGroup field. Users who are members of any group specified in the adminGroups field of the LokiStack CR are considered administrators.
Administrator users have access to all application logs in all namespaces, if they also get assigned the cluster-logging-application-view role.
Example LokiStack CR
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
tenants:
mode: openshift-logging 1
openshift:
adminGroups: 2
- cluster-admin
- custom-admin-group 33.2.9. Creating a new group for the cluster-admin user role
Querying application logs for multiple namespaces as a cluster-admin user, where the sum total of characters of all of the namespaces in the cluster is greater than 5120, results in the error Parse error: input size too long (XXXX > 5120). For better control over access to logs in LokiStack, make the cluster-admin user a member of the cluster-admin group. If the cluster-admin group does not exist, create it and add the desired users to it.
Use the following procedure to create a new group for users with cluster-admin permissions.
Procedure
Enter the following command to create a new group:
$ oc adm groups new cluster-admin
Enter the following command to add the desired user to the
cluster-admingroup:$ oc adm groups add-users cluster-admin <username>
Enter the following command to add the
cluster-adminuser role to the group:$ oc adm policy add-cluster-role-to-group cluster-admin cluster-admin
3.3. Enhanced reliability and performance for Loki
Use the following configurations to ensure reliability and efficiency of Loki in production.
3.3.1. Loki pod placement
You can control which nodes the Loki pods run on, and prevent other workloads from using those nodes, by using tolerations or node selectors on the pods.
You can apply tolerations to the log store pods with the LokiStack custom resource (CR) and apply taints to a node with the node specification. A taint on a node is a key:value pair that instructs the node to repel all pods that do not tolerate the taint. Using a specific key:value pair that is not on other pods ensures that only the log store pods can run on that node.
Example LokiStack with node selectors
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
# ...
template:
compactor: 1
nodeSelector:
node-role.kubernetes.io/infra: "" 2
distributor:
nodeSelector:
node-role.kubernetes.io/infra: ""
gateway:
nodeSelector:
node-role.kubernetes.io/infra: ""
indexGateway:
nodeSelector:
node-role.kubernetes.io/infra: ""
ingester:
nodeSelector:
node-role.kubernetes.io/infra: ""
querier:
nodeSelector:
node-role.kubernetes.io/infra: ""
queryFrontend:
nodeSelector:
node-role.kubernetes.io/infra: ""
ruler:
nodeSelector:
node-role.kubernetes.io/infra: ""
# ...Example LokiStack CR with node selectors and tolerations
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
# ...
template:
compactor:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
distributor:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
indexGateway:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
ingester:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
querier:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
queryFrontend:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
ruler:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
gateway:
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
- effect: NoExecute
key: node-role.kubernetes.io/infra
value: reserved
# ...
To configure the nodeSelector and tolerations fields of the LokiStack (CR), you can use the oc explain command to view the description and fields for a particular resource:
$ oc explain lokistack.spec.template
Example output
KIND: LokiStack
VERSION: loki.grafana.com/v1
RESOURCE: template <Object>
DESCRIPTION:
Template defines the resource/limits/tolerations/nodeselectors per
component
FIELDS:
compactor <Object>
Compactor defines the compaction component spec.
distributor <Object>
Distributor defines the distributor component spec.
...For more detailed information, you can add a specific field:
$ oc explain lokistack.spec.template.compactor
Example output
KIND: LokiStack
VERSION: loki.grafana.com/v1
RESOURCE: compactor <Object>
DESCRIPTION:
Compactor defines the compaction component spec.
FIELDS:
nodeSelector <map[string]string>
NodeSelector defines the labels required by a node to schedule the
component onto it.
...3.3.2. Configuring Loki to tolerate node failure
In the logging 5.8 and later versions, the Loki Operator supports setting pod anti-affinity rules to request that pods of the same component are scheduled on different available nodes in the cluster.
Affinity is a property of pods that controls the nodes on which they prefer to be scheduled. Anti-affinity is a property of pods that prevents a pod from being scheduled on a node.
In OpenShift Container Platform, pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key-value labels on other pods.
The Operator sets default, preferred podAntiAffinity rules for all Loki components, which include the compactor, distributor, gateway, indexGateway, ingester, querier, queryFrontend, and ruler components.
You can override the preferred podAntiAffinity settings for Loki components by configuring required settings in the requiredDuringSchedulingIgnoredDuringExecution field:
Example user settings for the ingester component
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
# ...
template:
ingester:
podAntiAffinity:
# ...
requiredDuringSchedulingIgnoredDuringExecution: 1
- labelSelector:
matchLabels: 2
app.kubernetes.io/component: ingester
topologyKey: kubernetes.io/hostname
# ...3.3.3. Enabling stream-based retention with Loki
You can configure retention policies based on log streams. Rules for these may be set globally, per-tenant, or both. If you configure both, tenant rules apply before global rules.
If there is no retention period defined on the s3 bucket or in the LokiStack custom resource (CR), then the logs are not pruned and they stay in the s3 bucket forever, which might fill up the s3 storage.
Schema v13 is recommended.
Procedure
Create a
LokiStackCR:Enable stream-based retention globally as shown in the following example:
Example global stream-based retention for AWS
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: limits: global: 1 retention: 2 days: 20 streams: - days: 4 priority: 1 selector: '{kubernetes_namespace_name=~"test.+"}' 3 - days: 1 priority: 1 selector: '{log_type="infrastructure"}' managementState: Managed replicationFactor: 1 size: 1x.small storage: schemas: - effectiveDate: "2020-10-11" version: v13 secret: name: logging-loki-s3 type: aws storageClassName: gp3-csi tenants: mode: openshift-logging
- 1
- Sets retention policy for all log streams. Note: This field does not impact the retention period for stored logs in object storage.
- 2
- Retention is enabled in the cluster when this block is added to the CR.
- 3
- Contains the Content from grafana.com is not included.LogQL query used to define the log stream.spec: limits:
Enable stream-based retention per-tenant basis as shown in the following example:
Example per-tenant stream-based retention for AWS
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: limits: global: retention: days: 20 tenants: 1 application: retention: days: 1 streams: - days: 4 selector: '{kubernetes_namespace_name=~"test.+"}' 2 infrastructure: retention: days: 5 streams: - days: 1 selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}' managementState: Managed replicationFactor: 1 size: 1x.small storage: schemas: - effectiveDate: "2020-10-11" version: v13 secret: name: logging-loki-s3 type: aws storageClassName: gp3-csi tenants: mode: openshift-logging- 1
- Sets retention policy by tenant. Valid tenant types are
application,audit, andinfrastructure. - 2
- Contains the Content from grafana.com is not included.LogQL query used to define the log stream.
Apply the
LokiStackCR:$ oc apply -f <filename>.yaml
3.3.4. Configuring Loki to tolerate memberlist creation failure
In an OpenShift Container Platform cluster, administrators generally use a non-private IP network range. As a result, the LokiStack memberlist configuration fails because, by default, it only uses private IP networks.
As an administrator, you can select the pod network for the memberlist configuration. You can modify the LokiStack custom resource (CR) to use the podIP address in the hashRing spec. To configure the LokiStack CR, use the following command:
$ oc patch LokiStack logging-loki -n openshift-logging --type=merge -p '{"spec": {"hashRing":{"memberlist":{"instanceAddrType":"podIP"},"type":"memberlist"}}}'Example LokiStack to include podIP
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
# ...
hashRing:
type: memberlist
memberlist:
instanceAddrType: podIP
# ...3.3.5. LokiStack behavior during cluster restarts
When an OpenShift Container Platform cluster is restarted, LokiStack ingestion and the query path continue to operate within the available CPU and memory resources available for the node. This means that there is no downtime for the LokiStack during OpenShift Container Platform cluster updates. This behavior is achieved by using PodDisruptionBudget resources. The Loki Operator provisions PodDisruptionBudget resources for Loki, which determine the minimum number of pods that must be available per component to ensure normal operations under certain conditions.
3.4. Advanced deployment and scalability for Loki
You can configure high availability, scalability, and error handling for Loki.
3.4.1. Zone aware data replication
The Loki Operator offers support for zone-aware data replication through pod topology spread constraints. Enabling this feature enhances reliability and safeguards against log loss in the event of a single zone failure. When configuring the deployment size as 1x.extra-small, 1x.small, or 1x.medium, the replication.factor field is automatically set to 2.
To ensure proper replication, you need to have at least as many availability zones as the replication factor specifies. While it is possible to have more availability zones than the replication factor, having fewer zones can lead to write failures. Each zone should host an equal number of instances for optimal operation.
Example LokiStack CR with zone replication enabled
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: replicationFactor: 2 1 replication: factor: 2 2 zones: - maxSkew: 1 3 topologyKey: topology.kubernetes.io/zone 4
- 1
- Deprecated field, values entered are overwritten by
replication.factor. - 2
- This value is automatically set when deployment size is selected at setup.
- 3
- The maximum difference in number of pods between any two topology domains. The default is 1, and you cannot specify a value of 0.
- 4
- Defines zones in the form of a topology key that corresponds to a node label.
3.4.2. Recovering Loki pods from failed zones
In OpenShift Container Platform a zone failure happens when specific availability zone resources become inaccessible. Availability zones are isolated areas within a cloud provider’s data center, aimed at enhancing redundancy and fault tolerance. If your OpenShift Container Platform cluster is not configured to handle this, a zone failure can lead to service or data loss.
Loki pods are part of a Content from kubernetes.io is not included.StatefulSet, and they come with Persistent Volume Claims (PVCs) provisioned by a StorageClass object. Each Loki pod and its PVCs reside in the same zone. When a zone failure occurs in a cluster, the StatefulSet controller automatically attempts to recover the affected pods in the failed zone.
The following procedure will delete the PVCs in the failed zone, and all data contained therein. To avoid complete data loss, the replication factor field of the LokiStack CR should always be set to a value greater than 1 to ensure that Loki is replicating.
Prerequisites
-
Verify your
LokiStackCR has a replication factor greater than 1. - Zone failure detected by the control plane, and nodes in the failed zone are marked by cloud provider integration.
The StatefulSet controller automatically attempts to reschedule pods in a failed zone. Because the associated PVCs are also in the failed zone, automatic rescheduling to a different zone does not work. You must manually delete the PVCs in the failed zone to allow successful re-creation of the stateful Loki Pod and its provisioned PVC in the new zone.
Procedure
List the pods in
Pendingstatus by running the following command:$ oc get pods --field-selector status.phase==Pending -n openshift-logging
Example
oc get podsoutputNAME READY STATUS RESTARTS AGE 1 logging-loki-index-gateway-1 0/1 Pending 0 17m logging-loki-ingester-1 0/1 Pending 0 16m logging-loki-ruler-1 0/1 Pending 0 16m- 1
- These pods are in
Pendingstatus because their corresponding PVCs are in the failed zone.
List the PVCs in
Pendingstatus by running the following command:$ oc get pvc -o=json -n openshift-logging | jq '.items[] | select(.status.phase == "Pending") | .metadata.name' -r
Example
oc get pvcoutputstorage-logging-loki-index-gateway-1 storage-logging-loki-ingester-1 wal-logging-loki-ingester-1 storage-logging-loki-ruler-1 wal-logging-loki-ruler-1
Delete the PVC(s) for a pod by running the following command:
$ oc delete pvc <pvc_name> -n openshift-logging
Delete the pod(s) by running the following command:
$ oc delete pod <pod_name> -n openshift-logging
Once these objects have been successfully deleted, they should automatically be rescheduled in an available zone.
3.4.2.1. Troubleshooting PVC in a terminating state
The PVCs might hang in the terminating state without being deleted, if PVC metadata finalizers are set to kubernetes.io/pv-protection. Removing the finalizers should allow the PVCs to delete successfully.
Remove the finalizer for each PVC by running the command below, then retry deletion.
$ oc patch pvc <pvc_name> -p '{"metadata":{"finalizers":null}}' -n openshift-logging
3.4.3. Troubleshooting Loki rate limit errors
If the Log Forwarder API forwards a large block of messages that exceeds the rate limit to Loki, Loki generates rate limit (429) errors.
These errors can occur during normal operation. For example, when adding the logging to a cluster that already has some logs, rate limit errors might occur while the logging tries to ingest all of the existing log entries. In this case, if the rate of addition of new logs is less than the total rate limit, the historical data is eventually ingested, and the rate limit errors are resolved without requiring user intervention.
In cases where the rate limit errors continue to occur, you can fix the issue by modifying the LokiStack custom resource (CR).
The LokiStack CR is not available on Grafana-hosted Loki. This topic does not apply to Grafana-hosted Loki servers.
Conditions
- The Log Forwarder API is configured to forward logs to Loki.
Your system sends a block of messages that is larger than 2 MB to Loki. For example:
"values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\ ....... ...... ...... ...... \"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}After you enter
oc logs -n openshift-logging -l component=collector, the collector logs in your cluster show a line containing one of the following error messages:429 Too Many Requests Ingestion rate limit exceeded
Example Vector error message
2023-08-25T16:08:49.301780Z WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=trueThe error is also visible on the receiving end. For example, in the LokiStack ingester pod:
Example Loki ingester error message
level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream
Procedure
Update the
ingestionBurstSizeandingestionRatefields in theLokiStackCR:apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: limits: global: ingestion: ingestionBurstSize: 16 1 ingestionRate: 8 2 # ...- 1
- The
ingestionBurstSizefield defines the maximum local rate-limited sample size per distributor replica in MB. This value is a hard limit. Set this value to at least the maximum logs size expected in a single push request. Single requests that are larger than theingestionBurstSizevalue are not permitted. - 2
- The
ingestionRatefield is a soft limit on the maximum amount of ingested samples per second in MB. Rate limit errors occur if the rate of logs exceeds the limit, but the collector retries sending the logs. As long as the total average is lower than the limit, the system recovers and errors are resolved without user intervention.
3.5. Log-based alerts for Loki
You can configure log-based alerts for Loki by creating an AlertingRule custom resource (CR).
3.5.1. Authorizing LokiStack rules RBAC permissions
Administrators can allow users to create and manage their own alerting and recording rules by binding cluster roles to usernames. Cluster roles are defined as ClusterRole objects that contain necessary role-based access control (RBAC) permissions for users.
The following cluster roles for alerting and recording rules are available for LokiStack:
| Rule name | Description |
|---|---|
|
|
Users with this role have administrative-level access to manage alerting rules. This cluster role grants permissions to create, read, update, delete, list, and watch |
|
|
Users with this role can view the definitions of Custom Resource Definitions (CRDs) related to |
|
|
Users with this role have permission to create, update, and delete |
|
|
Users with this role can read |
|
|
Users with this role have administrative-level access to manage recording rules. This cluster role grants permissions to create, read, update, delete, list, and watch |
|
|
Users with this role can view the definitions of Custom Resource Definitions (CRDs) related to |
|
|
Users with this role have permission to create, update, and delete |
|
|
Users with this role can read |
3.5.1.1. Examples
To apply cluster roles for a user, you must bind an existing cluster role to a specific username.
Cluster roles can be cluster or namespace scoped, depending on which type of role binding you use. When a RoleBinding object is used, as when using the oc adm policy add-role-to-user command, the cluster role only applies to the specified namespace. When a ClusterRoleBinding object is used, as when using the oc adm policy add-cluster-role-to-user command, the cluster role applies to all namespaces in the cluster.
The following example command gives the specified user create, read, update and delete (CRUD) permissions for alerting rules in a specific namespace in the cluster:
Example cluster role binding command for alerting rule CRUD permissions in a specific namespace
$ oc adm policy add-role-to-user alertingrules.loki.grafana.com-v1-admin -n <namespace> <username>
The following command gives the specified user administrator permissions for alerting rules in all namespaces:
Example cluster role binding command for administrator permissions
$ oc adm policy add-cluster-role-to-user alertingrules.loki.grafana.com-v1-admin <username>
3.5.2. Creating a log-based alerting rule with Loki
The AlertingRule CR contains a set of specifications and webhook validation definitions to declare groups of alerting rules for a single LokiStack instance. In addition, the webhook validation definition provides support for rule validation conditions:
-
If an
AlertingRuleCR includes an invalidintervalperiod, it is an invalid alerting rule -
If an
AlertingRuleCR includes an invalidforperiod, it is an invalid alerting rule. -
If an
AlertingRuleCR includes an invalid LogQLexpr, it is an invalid alerting rule. -
If an
AlertingRuleCR includes two groups with the same name, it is an invalid alerting rule. - If none of the above applies, an alerting rule is considered valid.
Table 3.3. AlertingRule definitions
| Tenant type | Valid namespaces for AlertingRule CRs |
|---|---|
| application |
|
| audit |
|
| infrastructure |
|
Procedure
Create an
AlertingRulecustom resource (CR):Example infrastructure
AlertingRuleCRapiVersion: loki.grafana.com/v1 kind: AlertingRule metadata: name: loki-operator-alerts namespace: openshift-operators-redhat 1 labels: 2 openshift.io/<label_name>: "true" spec: tenantID: "infrastructure" 3 groups: - name: LokiOperatorHighReconciliationError rules: - alert: HighPercentageError expr: | 4 sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job) / sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job) > 0.01 for: 10s labels: severity: critical 5 annotations: summary: High Loki Operator Reconciliation Errors 6 description: High Loki Operator Reconciliation Errors 7- 1
- The namespace where this
AlertingRuleCR is created must have a label matching the LokiStackspec.rules.namespaceSelectordefinition. - 2
- The
labelsblock must match the LokiStackspec.rules.selectordefinition. - 3
AlertingRuleCRs forinfrastructuretenants are only supported in theopenshift-*,kube-\*, ordefaultnamespaces.- 4
- The value for
kubernetes_namespace_name:must match the value formetadata.namespace. - 5
- The value of this mandatory field must be
critical,warning, orinfo. - 6
- This field is mandatory.
- 7
- This field is mandatory.
Example application
AlertingRuleCRapiVersion: loki.grafana.com/v1 kind: AlertingRule metadata: name: app-user-workload namespace: app-ns 1 labels: 2 openshift.io/<label_name>: "true" spec: tenantID: "application" groups: - name: AppUserWorkloadHighError rules: - alert: expr: | 3 sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job) for: 10s labels: severity: critical 4 annotations: summary: 5 description: 6- 1
- The namespace where this
AlertingRuleCR is created must have a label matching the LokiStackspec.rules.namespaceSelectordefinition. - 2
- The
labelsblock must match the LokiStackspec.rules.selectordefinition. - 3
- Value for
kubernetes_namespace_name:must match the value formetadata.namespace. - 4
- The value of this mandatory field must be
critical,warning, orinfo. - 5
- The value of this mandatory field is a summary of the rule.
- 6
- The value of this mandatory field is a detailed description of the rule.
Apply the
AlertingRuleCR:$ oc apply -f <filename>.yaml
Chapter 4. Loki query performance troubleshooting
This documentation details methods for optimizing your Logging stack to improve query performance and provides steps for troubleshooting.
4.1. Best practices for Loki query performance
You can take the following steps to improve Loki query performance:
- Ensure that you are running the latest version of the Loki Operator.
-
Ensure that you have migrated LokiStack schema to the
v13version. Ensure that you use reliable and fast object storage. Loki places significant demands on object storage. If you are not using an object store solution from a cloud provider, use solid-state drive (SSD) for your object storage. By using SSDs you can benefit from the high parallelization capabilities of Loki.
To better understand the utilization of object storage by Loki, you can use the following query in the Metrics dashboard in the OpenShift Container Platform web console:
sum by(status, container, operation) (label_replace(rate(loki_s3_request_duration_seconds_count{namespace="openshift-logging"}[5m]), "status", "${1}xx", "status_code", "([0-9]).."))-
Loki Operator enables automatic stream sharding by default. The default automatic stream sharding mechanism should be adequate in most cases and users should not need to configure
perStream*attributes. - If you use the OpenTelemetry Protocol (OTLP) data model, you can configure additional stream labels in LokiStack. For more information, see This content is not included.Best practices for Loki labels.
- Different types of queries have different performance characteristics. Use simple filter queries instead of regular expressions for better performance.
Additional resources
4.2. Best practices for Loki labels
Labels in Loki are the keyspace on which Loki shards incoming data. They are also the index used for finding logs at query-time. You can optimize query performance by properly using labels.
Consider the following criteria when creating labels:
- Labels should describe infrastructure. This could include regions, clusters, servers, applications, namespaces, or environments.
- Labels are long-lived. Label values should generate logs perpetually, or at least for several hours.
- Labels are intuitive for querying.
4.3. Configuration of stream labels in Loki Operator
Configuring which labels the Loki Operator will use as stream labels depends on the data model you are using: ViaQ or OpenTelemetry Protocol (OTLP).
Both models come with a predefined set of stream labels, for more information, see OpenTelemetry data model.
- ViaQ model
ViaQ does not support structured metadata. To configure stream labels for the ViaQ model, add the configuration in the
ClusterLogForwarderresource. For example:apiVersion: observability.openshift.io/v1 kind: ClusterLogForwarder metadata: name: instance namespace: openshift-logging spec: serviceAccount: name: logging-collector outputs: - name: lokistack-out type: lokiStack lokiStack: target: name: logging-loki namespace: openshift-logging labelKeys: application: ignoreGlobal: <true_or_false> labelKeys: [] audit: ignoreGlobal: <true_or_false> labelKeys: [] infrastructure: ignoreGlobal: <true_or_false> labelKeys: [] global: []lokiStack.labelKeysfield contains the configuration that maps log record keys to Loki labels used to identify streams.- OTLP model
- In the OTLP model all labels that are not specified as stream labels are attached as structured metadata.
The following are the best practices for creating stream labels:
- have a low cardinality with at most tens of values.
-
The values are long lived. For example, the first level of an HTTP path:
/load,/save, and/update. - The labels can be used in queries to improve query performance.
4.4. Analyzing Loki query performance
Every query and subquery in Loki generates a metrics.go log line with performance statistics. Subqueries emit the log line in the queriers. Every query has an associated single summary metrics.go line emitted by the query-front end. Use these statistics to calculate the query performance metrics.
Prerequisites
- You have administrator permissions.
- You have access to the OpenShift Container Platform web console.
- You installed and configured Loki Operator.
Procedure
- In the OpenShift Container Platform web console, navigate to the Metrics → Observe tab.
Note the following values:
- duration: Denotes the amount of time a query took to run.
- queue_time: Denotes the time a query spent in the queue before being processed.
- chunk_refs_fetch_time: Denotes the amount of time spent in getting chunk information from the index.
- store_chunks_download_time: Denotes the amount of time in getting chunks from cache or storage.
Calculate the following performance metrics:
total query time as
total_duration:total_duration = duration + queue_time
Percentage of the total duration that a query spent in the queue as
Queue Time:Queue Time = queue_time / total_duration * 100Calculate the percentage of the total duration that was spent in getting chunk information from the index as
Chunk Refs Fetch Time:Chunk Refs Fetch Time = chunk_refs_fetch_time / total_duration * 100Calculate the percentage of the total duration that was spent in getting chunks from cache or storage:
Chunks Download Time = store_chunks_download_time / total_duration * 100Calculate the percentage of the total duration that was spent in executing the query:
Execution Time = (duration - chunk_refs_fetch_time - store_chunks_download_time) / total_duration * 100
- Refer to This content is not included.Query performance analysis to understand the reason for each metric and how each metric affects query performance.
4.5. Query performance analysis
For best query performance, you want to see as much time as possible spent in execution time, denoted by the Execution Time metric. See the table below for the reason other performance metrics might be higher and the steps you can take to improve them. You can also reduce the execution time by modifying your queries, thereby improving the overall performance.
| Issue | Reason | Fix |
|---|---|---|
|
High | Queries might be doing many CPU-intensive operations such as regular expression processing. | You can make the following changes:
|
| Your queries have many small log lines. | If your queries have many small lines, execution becomes dependent on how fast Loki can iterate the lines themselves. This becomes a CPU clock frequency bottleneck. To make things faster you need a faster CPU. | |
|
High | You do not have enough queriers running. |
The only fix is to increase the number of queriers replicas in the |
|
High |
Insufficient number of index-gateway replicas in the | Increase the number of index-gateway replicas or ensure they have enough CPU resources. |
|
High | The chunks might be too small |
Check the average chunk size by dividing |
| Query timing out | Query timeout value might be too low |
Increase the |