Observability
Observability and Service Mesh
Abstract
Chapter 1. Observability and Service Mesh
Red Hat OpenShift Observability provides real-time visibility, monitoring, and analysis of various system metrics, logs, and events to help you quickly diagnose and troubleshoot issues before they impact systems or applications.
1.1. About Observability and Service Mesh
Red Hat OpenShift Observability connects open source observability tools and technologies to create a unified Observability solution. The components of Red Hat OpenShift Observability work together to help you collect, store, deliver, analyze, and visualize data.
Red Hat OpenShift Service Mesh integrates with the following Red Hat OpenShift Observability components:
- OpenShift Monitoring
- Red Hat OpenShift distributed tracing platform
OpenShift Service Mesh also integrates with:
- Kiali provided by Red Hat, a powerful console for visualizing and managing your service mesh.
- OpenShift Service Mesh Console (OSSMC) plugin, an OpenShift Container Platform console plugin that seamlessly integrates Kiali console features into your OpenShift console.
The following components in OpenShift Service Mesh ambient mode generate a detailed telemetry for all service communications within a mesh:
- Ztunnel generates Layer 4 (L4) telemetry such as TCP metrics.
- Waypoint proxies generates Layer 7 (L7) telemetry for HTTP, HTTP/2, gRPC traffic metrics, and distributed traces.
Chapter 2. Metrics and Service Mesh
2.1. Using metrics
You can use the OpenShift Container Platform monitoring stack and Red Hat OpenShift Service Mesh to track the health and performance of your applications. You can learn how to monitor metrics and alerts for both standard and ambient mesh modes.
2.1.1. About metrics
You can monitor service mesh application health and performance by using the platform monitoring stack to track Layer 4 (L4) and Layer 7 (L7) metrics across sidecar, ztunnel, and waypoint proxies.
Every OpenShift Container Platform installation deploys monitoring stack components by default, and the Cluster Monitoring Operator (CMO) manages them. These components include Prometheus, Alertmanager, Thanos Querier, and others. The CMO also deploys the Telemeter Client, which sends a subset of data from platform Prometheus instances to Red Hat to ease Remote Health Monitoring for clusters.
When you have added your application to the mesh, you can monitor the in-cluster health and performance of your applications running on OpenShift Container Platform with metrics and customized alerts for CPU and memory usage, network connectivity, and other resource usage.
When you have added your application to the mesh in ambient mode, you can monitor the Istio standard metrics of your application from the ztunnel resource and the waypoint proxies. The ztunnel also exposes a variety of DNS and debugging metrics.
Ambient mode uses two proxy layers, which results in two types of metrics for each application service. You can collect L4 TCP metrics from both the ztunnel and the waypoint proxies. You can collect L7 metrics, such as HTTP traffic metrics, from the waypoint proxies.
2.1.2. Configuring OpenShift Monitoring with Service Mesh
You can integrate Red Hat OpenShift Service Mesh with user-workload monitoring to enable observability in your service mesh. User-workload monitoring provides access to essential built-in tools. Kiali requires this feature to run the dedicated console for Istio.
Prerequisites
- You have installed the Red Hat OpenShift Service Mesh Operator.
You have enabled the user-workload monitoring.
NoteYou can enable user-workload monitoring by applying the
ConfigMapchange for metrics integration. For more information, see "Configuring user workload monitoring".
Procedure
Create a
Telemetryresource in the Istio control plane namespace to ensure that Prometheus is a metrics provider, similar to the following example:apiVersion: telemetry.istio.io/v1 kind: Telemetry metadata: name: enable-prometheus-metrics namespace: istio-system spec: metrics: - providers: - name: prometheusCreate a
ServiceMonitorresource that monitors the Istio control plane, similar to the following example:apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: istiod-monitor namespace: istio-system spec: targetLabels: - app selector: matchLabels: istio: pilot endpoints: - port: http-monitoring interval: 30sCreate a
PodMonitorresource that collects metrics from the Istio proxies, similar to the following example:apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: istio-proxies-monitor namespace: istio-system spec: selector: matchExpressions: - key: istio-prometheus-ignore operator: DoesNotExist podMetricsEndpoints: - path: /stats/prometheus interval: 30s relabelings: - action: keep sourceLabels: [__meta_kubernetes_pod_container_name] regex: "istio-proxy" - action: keep sourceLabels: [__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape] - action: replace regex: (\\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4}) replacement: '[\$2]:\$1' sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_ip] targetLabel: __address__ - action: replace regex: (\\d+);((([0-9]+?)(\.|$)){4}) replacement: \$2:\$1 sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_ip] targetLabel: __address__ # Set the 'app' label from 'app.kubernetes.io/name' or fallback to 'app' - sourceLabels: ["__meta_kubernetes_pod_label_app_kubernetes_io_name", "__meta_kubernetes_pod_label_app"] separator: ";" targetLabel: "app" action: replace regex: "(.+);.*|.*;(.+)" replacement: "\${1}\${2}" # Use the first non-empty value # Set the 'version' label from 'app.kubernetes.io/version' or fallback to 'version' - sourceLabels: ["__meta_kubernetes_pod_label_app_kubernetes_io_version", "__meta_kubernetes_pod_label_version"] separator: ";" targetLabel: "version" action: replace regex: "(.+);.*|.*;(.+)" replacement: "\${1}\${2}" # Use the first non-empty value # additional labels - sourceLabels: [__meta_kubernetes_namespace] action: replace targetLabel: namespace - action: replace replacement: "mesh_id" targetLabel: mesh_idwhere:
istio-system-
Specifies that you must apply the
PodMonitorobject in all mesh namespaces, including the Istio control plane namespace, because OpenShift Container Platform monitoring ignores thenamespaceSelectorspec inServiceMonitorandPodMonitorobjects. mesh_id- Specify the actual mesh ID.
\\d+-
The additional backslash is only used when you apply this replacement from a command line through heredoc. If you apply this from a YAML file, replace
\\d+with\d+. \$-
The backslash is only used when you apply this replacement from a command line through heredoc. If you apply this from a YAML file, replace
\$with$.
To validate that the
ServiceMonitorandPodMonitorresources are monitoring the Istio control plane, go to the OpenShift Console, navigate to Observe → Metrics, and run the queryistio_requests_total. Confirm that the metrics for the Istio request are displayed.NoteThe Metrics implementation can take a few minutes for the query to return results.
2.1.3. Configuring OpenShift Monitoring with Service Mesh ambient mode
You can integrate Red Hat OpenShift Service Mesh with user-workload monitoring to enable observability in your service mesh ambient mode. User-workload monitoring provides access to essential built-in tools. Kiali requires this feature to run the dedicated console for Istio.
Prerequisites
- You have installed the Red Hat OpenShift Service Mesh Operator.
You have enabled the user-workload monitoring.
NoteYou can enable user workload monitoring by applying the
ConfigMapchange for metrics integration. For more information, see "Configuring user workload monitoring".
Procedure
Create a
Telemetryresource in the Istio control plane namespace to ensure that Prometheus is a metrics provider, similar to the following example:apiVersion: telemetry.istio.io/v1 kind: Telemetry metadata: name: enable-prometheus-metrics namespace: istio-system spec: metrics: - providers: - name: prometheusCreate a
ServiceMonitorresource that monitors the Istio control plane, similar to the following example:apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: istiod-monitor namespace: istio-system spec: targetLabels: - app selector: matchLabels: istio: pilot endpoints: - port: http-monitoring interval: 30sCreate a
PodMonitorresource in theztunnelnamespace for collecting the ztunnel metrics, similar to the following example:apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: istio-ztunnel-monitor namespace: ztunnel spec: selector: matchExpressions: - key: istio-prometheus-ignore operator: DoesNotExist podMetricsEndpoints: - path: /stats/prometheus interval: 30s relabelings: - action: keep sourceLabels: [__meta_kubernetes_pod_container_name] regex: "istio-proxy" - action: keep sourceLabels: [__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape] - action: replace regex: (\\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4}) replacement: '[\$2]:\$1' sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_ip] targetLabel: __address__ - action: replace regex: (\\d+);((([0-9]+?)(\.|$)){4}) replacement: \$2:\$1 sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_ip] targetLabel: __address__ # Set the 'app' label from 'app.kubernetes.io/name' or fallback to 'app' - sourceLabels: ["__meta_kubernetes_pod_label_app_kubernetes_io_name", "__meta_kubernetes_pod_label_app"] separator: ";" targetLabel: "app" action: replace regex: "(.+);.*|.*;(.+)" replacement: "\${1}\${2}" # Use the first non-empty value # Set the 'version' label from 'app.kubernetes.io/version' or fallback to 'version' - sourceLabels: ["__meta_kubernetes_pod_label_app_kubernetes_io_version", "__meta_kubernetes_pod_label_version"] separator: ";" targetLabel: "version" action: replace regex: "(.+);.*|.*;(.+)" replacement: "\${1}\${2}" # Use the first non-empty value # additional labels - sourceLabels: [__meta_kubernetes_namespace] action: replace targetLabel: namespace - action: replace replacement: "mesh_id" targetLabel: mesh_idwhere:
mesh_id- Specify the actual mesh ID.
\\d+-
The additional backslash is only used when you apply this replacement from a command line through heredoc. If you apply this from a YAML file, replace
\\d+with\d+. \$-
The backslash is only used when you apply this replacement from a command line through heredoc. If you apply this from a YAML file, replace
\$with$.
Optional: Deploy a waypoint proxy to enable the Layer 7 (L7) OpenShift Service Mesh features in ambient mode:
Deploy a waypoint proxy for the
bookinfonamespace, similar to the following example:apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: labels: istio.io/waypoint-for: service name: waypoint namespace: bookinfo spec: gatewayClassName: istio-waypoint listeners: - name: mesh port: 15008 protocol: HBONEEnroll the namespace to use the waypoint by running the following command:
$ oc label namespace bookinfo istio.io/use-waypoint=waypoint
Create a
PodMonitorresource for collecting waypoint proxies metrics in an application namespace such asbookinfo, similar to the following example:apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: istio-waypoint-monitor namespace: bookinfo spec: selector: matchExpressions: - key: istio-prometheus-ignore operator: DoesNotExist podMetricsEndpoints: - path: /stats/prometheus interval: 30s relabelings: - action: keep sourceLabels: [__meta_kubernetes_pod_container_name] regex: "istio-proxy" - action: keep sourceLabels: [__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape] - action: replace regex: (\\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4}) replacement: '[\$2]:\$1' sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_ip] targetLabel: __address__ - action: replace regex: (\\d+);((([0-9]+?)(\.|$)){4}) replacement: \$2:\$1 sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_ip] targetLabel: __address__ # Set the 'app' label from 'app.kubernetes.io/name' or fallback to 'app' - sourceLabels: ["__meta_kubernetes_pod_label_app_kubernetes_io_name", "__meta_kubernetes_pod_label_app"] separator: ";" targetLabel: "app" action: replace regex: "(.+);.*|.*;(.+)" replacement: "\${1}\${2}" # Use the first non-empty value # Set the 'version' label from 'app.kubernetes.io/version' or fallback to 'version' - sourceLabels: ["__meta_kubernetes_pod_label_app_kubernetes_io_version", "__meta_kubernetes_pod_label_version"] separator: ";" targetLabel: "version" action: replace regex: "(.+);.*|.*;(.+)" replacement: "\${1}\${2}" # Use the first non-empty value # additional labels - sourceLabels: [__meta_kubernetes_namespace] action: replace targetLabel: namespace - action: replace replacement: "mesh_id" targetLabel: mesh_idwhere:
mesh_id- Specify the actual mesh ID.
\\d+-
The additional backslash is only used when you apply this replacement from a command line through heredoc. If you apply this from a YAML file, replace
\\d+with\d+. \$-
The backslash is only used when you apply this replacement from a command line through heredoc. If you apply this from a YAML file, replace
\$with$.
NoteA waypoint proxy generates Layer 4 (L4) and L7 metrics. It scopes these statistics by Envoy proxy functions. The Envoy proxy documentation describes the statistic functions, for example,
Upstream connection,Listener,HTTP Connection Manager,TCP proxy, andRouter.
2.1.3.1. Verifying metrics in ambient mode
You can verify that the metrics for your application available in the OpenShift Console.
Prerequisites
- You have deployed the Bookinfo application in ambient mode to use the following example. For more information, see "Deploying the Bookinfo application in Istio ambient mode".
Procedure
- On the OpenShift Console go to Observe → Targets.
Find the status of
Metrics Targetsby searching for targets such asistiod-monitor,istio-ztunnel-monitor, andistio-waypoint-monitor. You can createistio-waypoint-monitoronly if you created the waypoint to use Layer 7 (L7) OpenShift Service Mesh features.NoteThe
ServiceMonitorresource configuration can take a few minutes to show in theMetrics Targetsresults.Send some traffic to the Bookinfo
productpageservice for generating metrics, by running the following command:$ curl "http://${GATEWAY_URL}/productpage" | grep "<title>"-
On the OpenShift Console go to Observe → Metrics and run a query such as,
istio_build,istio_tcp_received_bytes_total, oristio_requests_total.
2.1.4. Additional resources
Chapter 3. Distributed tracing and Service Mesh
3.1. Configuring Red Hat OpenShift distributed tracing platform with Service Mesh
Integrate Red Hat OpenShift distributed tracing platform with Red Hat OpenShift Service Mesh by using Red Hat OpenShift distributed tracing platform (Tempo) for distributed tracing platform storage and Red Hat OpenShift distributed tracing data collection for standardized telemetry data collection and processing.
3.1.1. About Red Hat OpenShift distributed tracing platform and Red Hat OpenShift Service Mesh
Two parts integrate Red Hat OpenShift distributed tracing platform with Red Hat OpenShift Service Mesh: Red Hat OpenShift distributed tracing platform (Tempo) and Red Hat OpenShift distributed tracing data collection.
- Red Hat OpenShift distributed tracing platform (Tempo)
Provides distributed tracing platform to monitor and troubleshoot transactions in complex distributed systems. Tempo derives its core functionality from the open source Grafana Tempo project.
For more about information about distributed tracing platform (Tempo), its features, installation, and configuration, see, "Red Hat OpenShift distributed tracing platform (Tempo)".
- Red Hat OpenShift distributed tracing data collection
Derives its core functionality from the open source "OpenTelemetry project", which aims to offer unified, standardized, and vendor-neutral telemetry data collection for cloud-native software. Red Hat OpenShift distributed tracing data collection product provides support for deploying and managing the OpenTelemetry Collector and simplifying the instrumentation of workloads.
The "OpenTelemetry Collector" can receive, process, and forward telemetry data in many formats, making it the ideal component for telemetry processing and interoperability between telemetry systems. The Collector provides a unified solution for collecting and processing metrics, traces, and logs.
For more information about distributed tracing data collection, its features, installation, and configuration, see: "Red Hat OpenShift distributed tracing data collection".
3.1.2. Configuring Red Hat OpenShift distributed tracing data collection with Service Mesh
You can integrate Red Hat OpenShift Service Mesh with Red Hat OpenShift distributed tracing data collection to instrument, generate, collect, and export OpenTelemetry traces, metrics, and logs to analyze and understand the performance and behavior of the software.
Prerequisites
- You have installed the Tempo Operator. For more information, see "Installing the Tempo Operator".
- You have installed the Red Hat OpenShift distributed tracing data collection Operator. For more information, see "Installing the Red Hat build of OpenTelemetry".
-
You have installed a
TempoStackand configured it in atemponamespace. For more information, see "Installing aTempoStackinstance". -
You have created an
Istioinstance. -
You have created an
IstioCNIinstance.
Procedure
Navigate to the Red Hat OpenShift distributed tracing data collection Operator and install the
OpenTelemetryCollectorresource in theistio-systemnamespace, similar to the following example:kind: OpenTelemetryCollector apiVersion: opentelemetry.io/v1beta1 metadata: name: otel namespace: istio-system spec: observability: metrics: {} deploymentUpdateStrategy: {} config: exporters: otlp: endpoint: 'tempo-sample-distributor.tempo.svc.cluster.local:4317' tls: insecure: true receivers: otlp: protocols: grpc: endpoint: '0.0.0.0:4317' http: {} service: pipelines: traces: exporters: - otlp receivers: - otlpUpdate the Red Hat OpenShift Service Mesh Istio custom resource (CR) to enable tracing and define the distributed tracing data collection tracing providers in your
meshConfig, similar to the following example:apiVersion: sailoperator.io/v1 kind: Istio metadata: # ... name: default spec: namespace: istio-system # ... values: meshConfig: enableTracing: true extensionProviders: - name: otel opentelemetry: port: 4317 service: otel-collector.istio-system.svc.cluster.local-
spec.values.meshConfig.ExtensionProviders.opentelemetry.serviceis theOpenTelemetrycollector service in theistio-systemnamespace.
-
Create an Istio Telemetry resource to enable tracers defined in
spec.values.meshConfig.ExtensionProviders, similar to the following example:apiVersion: telemetry.istio.io/v1 kind: Telemetry metadata: name: otel-demo namespace: istio-system spec: tracing: - providers: - name: otel randomSamplingPercentage: 100After you verify that you can see traces, lower the
randomSamplingPercentagevalue or set it todefaultto reduce the number of requests.NoteYou can use a single Istio Telemetry resource for both the Prometheus metrics provider and a tracing provider by setting
spec.metrics.overrides.disabledtofalse. This enables the Prometheus metrics provider. This is an optional step and you can skip it if you configured metrics through the OpenShift Cluster Monitoring method described in the earlier step.Create the
bookinfonamespace by running the following command:$ oc create ns bookinfo
Depending on the update strategy you are using, enable sidecar injection in the namespace by running the appropriate commands:
If you are using the
InPlaceupdate strategy, run the following command:$ oc label namespace curl istio-injection=enabled
If you are using the
RevisionBasedupdate strategy, run the following commands:Display the revision name by running the following command:
$ oc get istiorevisions.sailoperator.io
You should see output similar to the following example:
NAME TYPE READY STATUS IN USE VERSION AGE default Local True Healthy True v1.24.3 3m33s
Label the namespace with the revision name to enable sidecar injection by running the following command:
$ oc label namespace curl istio.io/rev=default
Deploy the
bookinfoapplication in thebookinfonamespace by running the following command:$ oc apply -f https://raw.githubusercontent.com/openshift-service-mesh/istio/release-1.24/samples/bookinfo/platform/kube/bookinfo.yaml -n bookinfo
Generate traffic to the
productpagepod to generate traces:$ oc exec -it -n bookinfo deployments/productpage-v1 -c istio-proxy -- curl localhost:9080/productpage
Validate the integration by running the following command to see traces in the UI:
$ oc get routes -n tempo tempo-sample-query-frontend
NoteYou must create the OpenShift route for the Jaeger UI in the Tempo namespace. You can either manually create it for the
tempo-sample-query-frontendservice, or update theTempocustom resource with.spec.template.queryFrontend.jaegerQuery.ingress.type: route.
3.1.3. Configuring Red Hat OpenShift distributed tracing platform (Tempo) with Service Mesh ambient mode
Generate Layer 7 (L7) tracing spans in OpenShift Service Mesh ambient mode by using waypoint or gateway proxies to capture application-level telemetry that the Layer 4 (L4) ztunnel component does not offer.
Prerequisites
- You have installed the Tempo Operator. For more information, see "Installing the Tempo Operator".
- You have installed the Red Hat OpenShift distributed tracing data collection Operator. For more information, see "Installing the Red Hat build of OpenTelemetry".
-
You have installed a
TempoStackand configured it in atemponamespace. For more information, see "Installing aTempoStackinstance". -
You have created an
Istioinstance.
Procedure
Navigate to the Red Hat OpenShift distributed tracing data collection Operator and install the
OpenTelemetryCollectorresource in theistio-systemnamespace, similar to the following example:kind: OpenTelemetryCollector apiVersion: opentelemetry.io/v1beta1 metadata: name: otel namespace: istio-system spec: mode: deployment observability: metrics: {} deploymentUpdateStrategy: {} config: exporters: otlp: endpoint: 'tempo-sample-distributor.tempo.svc.cluster.local:4317' tls: insecure: true receivers: otlp: protocols: grpc: endpoint: '0.0.0.0:4317' http: {} service: pipelines: traces: exporters: - otlp receivers: - otlp-
spec.config.exporters.otlp.endpointdefines the Tempo sample distributor service in a namespace such astempo.
-
Update the Red Hat OpenShift Service Mesh Istio custom resource (CR) to define a tracing provider in the
spec.values.meshConfigfield, similar to the following example:apiVersion: sailoperator.io/v1 kind: Istio metadata: # ... name: default spec: namespace: istio-system profile: ambient # ... values: meshConfig: enableTracing: true extensionProviders: - name: otel opentelemetry: port: 4317 service: otel-collector.istio-system.svc.cluster.local pilot: trustedZtunnelNamespace: ztunnel-
spec.values.meshConfig.extensionProviders.opentelemetry.servicedefines the OpenTelemetry collector service in theistio-systemnamespace.
-
Create an Istio Telemetry CR to enable the tracing provider defined in the
spec.values.meshConfig.ExtensionProvidersfield, similar to the following example:apiVersion: telemetry.istio.io/v1 kind: Telemetry metadata: name: otel-demo namespace: istio-system spec: tracing: - providers: - name: otel randomSamplingPercentage: 100NoteAfter you can see the traces, lower the
randomSamplingPercentagevalue or set it todefaultto reduce the number of requests. You can also use thespec.targetRefsfield to enable tracing at a gateway or a waypoint level.-
Optional: Use a single Istio Telemetry resource for both a Prometheus metrics provider and a tracing provider by setting
spec.metrics.overrides.disabledfield tofalse. This enables the Prometheus metrics provider. You do not need this step if you have configured metrics through the OpenShift Cluster Monitoring approach described in the earlier step.
3.1.3.1. Verifying traces in ambient mode
You can verify that the traces for your application are in ambient mode. The following example uses the Bookinfo application.
Prerequisites
- You have deployed the Bookinfo application in ambient mode to use the following example. For more information, see "Deploying the Bookinfo application in Istio ambient mode".
-
You have deployed a waypoint proxy and enrolled the
bookinfonamespace to use the waypoint. For more information, see "Deploying a waypoint proxy".
Procedure
Send some traffic to the Bookinfo
productpageservice for generating traces by running the following command:$ curl "http://${GATEWAY_URL}/productpage" | grep "<title>"Verify that the Bookinfo application traces appear in a Tempo dashboard UI by running the following command:
$ oc get routes -n tempo tempo-sample-query-frontend
-
Select the
bookinfo-gateway-istio.booinfoor thewaypoint.bookinfoservice from the dashboard UI. Click Find Traces.
NoteThe
TempoStackcustom resource (CR) creates the route for the Tempo dashboard UI when you set the .spec.template.queryFrontend.jaegerQuery.ingress.typefield toroute.
3.1.4. Additional resources
- Content from grafana.com is not included.Grafana Tempo
- Red Hat OpenShift distributed tracing platform (Tempo)
- Content from opentelemetry.io is not included.OpenTelemetry project
- Content from opentelemetry.io is not included.OpenTelemetry Collector
- Red Hat OpenShift distributed tracing data collection
- Installing the Tempo Operator
- Installing the Red Hat build of OpenTelemetry
- Installing a TempoStack instance
- Deploying the Bookinfo application in Istio ambient mode
- Deploying a waypoint proxy
Chapter 4. Kiali Operator provided by Red Hat
4.1. Using Kiali Operator provided by Red Hat
Once you have added your application to the mesh, you can use Kiali Operator provided by Red Hat to view the data flow through your application.
4.1.1. About Kiali
You can use Kiali Operator provided by Red Hat to view configurations, monitor traffic, and analyze traces in a single console. Kiali Operator provided by Red Hat derives its core functionality from the open source Kiali project.
Kiali Operator provided by Red Hat is the management console for Red Hat OpenShift Service Mesh. It provides dashboards, observability, and robust configuration and validation capabilities. It shows the structure of your service mesh by inferring traffic topology and displays the health of your mesh. Kiali provides detailed metrics, powerful validation, access to Grafana, and strong integration with the Red Hat OpenShift distributed tracing platform (Tempo).
4.1.2. About Kiali and Istio ambient mode
When running in Istio ambient mode, Kiali introduces new behaviors and visualizations to support the Ambient data plane. The following information describes key aspects of Kiali in this context:
- Access requirements
-
Kiali requires access to the
ztunnelnamespace to detect whether ambient mode is enabled. Without this access, Kiali does not display ambient-related features. - Visualizations and features
- Kiali displays ambient badges for namespaces and workloads you enrolled in the ambient mesh, enabling quick identification.
- Traffic graph adjustments
Ambient mode introduces new telemetry sources. Kiali collects and displays metrics from both ztunnel and waypoint proxies to give complete visibility into mesh traffic. You can focus on ambient-specific traffic sources by using new filters and selectors in Kiali. Kiali provides a display option for visualizing waypoint nodes in the traffic graph.
The traffic graph changes based on the ambient enrollment:
- Without waypoint proxies, the traffic graph displays only Layer 4 (L4) traffic.
- With waypoint proxies, the graph includes Layer 7 (L7) traffic and might also include L4 traffic.
- Workload proxy logs
- Kiali aggregates and filters logs from both ztunnel and waypoint proxies. This unified view simplifies troubleshooting by showing only the relevant log entries for each workload.
- Distributed tracing
- Tracing data is available only after you deploy waypoint proxies, because waypoint services generate the traces. Kiali automatically correlates workload traces with their associated waypoint proxies.
- Dedicated pages for ambient components
Analyze ambient components separately from workloads and services on the following dedicated pages:
- Waypoint pages display detailed information about captured workloads.
-
Ztunnel pages focus on telemetry, metrics, and diagnostics, based on data from
istioctlutilities.
Kiali integration with ambient mode ensures full observability for workloads running in the ambient mesh and simplifies operational monitoring and troubleshooting tasks.
4.1.3. Installing the Kiali Operator provided by Red Hat
The following steps show how to install the Kiali Operator provided by Red Hat.
Do not install the Community version of the Operator. The Community version is not supported.
Prerequisites
- You have access to the Red Hat OpenShift Service Mesh web console.
Procedure
- Log in to the Red Hat OpenShift Service Mesh web console.
- Navigate to Operators → OperatorHub.
- Type Kiali into the filter box to find the Kiali Operator provided by Red Hat.
- Click Kiali Operator provided by Red Hat to display information about the Operator.
- Click Install.
- On the Operator Installation page, select the stable Update Channel.
-
Select All namespaces on the cluster (default). This installs the Operator in the default
openshift-operatorsproject and makes the Operator available to all projects in the cluster. Select the Automatic Approval Strategy.
NoteThe Manual approval strategy requires a user with appropriate credentials to approve the Operator installation and subscription process.
- Click Install.
- The Installed Operators page displays the Kiali Operator’s installation progress.
4.1.4. Configuring OpenShift Monitoring with Kiali
The following steps show how to integrate the Kiali Operator provided by Red Hat with user-workload monitoring.
Prerequisites
- You have installed Red Hat OpenShift Service Mesh.
- You have enabled user-workload monitoring. See "Enabling monitoring for user-defined projects".
- You have configured OpenShift Monitoring with Service Mesh. See "Configuring OpenShift Monitoring with Service Mesh".
- You have Kiali Operator provided by Red Hat 2.4 installed.
Procedure
Create a
ClusterRoleBindingresource for Kiali similar to the following example:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kiali-monitoring-rbac roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-monitoring-view subjects: - kind: ServiceAccount name: kiali-service-account namespace: istio-system
Create a
Kialiresource and point it to your Istio instance similar to the following example:apiVersion: kiali.io/v1alpha1 kind: Kiali metadata: name: kiali-user-workload-monitoring namespace: istio-system spec: external_services: prometheus: auth: type: bearer use_kiali_token: true thanos_proxy: enabled: true url: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091When the
Kialiresource is ready, get the Kiali URL from the Route by running the following command:$ echo "https://$(oc get routes -n istio-system kiali -o jsonpath='{.spec.host}')"- Follow the URL to open Kiali in your web browser.
- Navigate to the Traffic Graph tab to check the traffic in the Kiali UI.
4.1.5. Integrating Red Hat OpenShift distributed tracing platform with Kiali Operator provided by Red Hat
You can integrate Red Hat OpenShift distributed tracing platform with Kiali Operator provided by Red Hat, which enables the following features:
- Display trace overlays and details on the graph.
- Display scatterplot charts and in-depth trace/span information on detail pages.
- Integrated span information in logs and metric charts.
- Offer links to the external tracing UI.
4.1.5.1. Configuring Red Hat OpenShift distributed tracing platform with Kiali Operator provided by Red Hat
Analyze service communication and troubleshoot request flows within the mesh by viewing distributed traces directly in the Kiali console.
Prerequisites
- You have installed Red Hat OpenShift Service Mesh.
- You have configured distributed tracing platform with Red Hat OpenShift Service Mesh.
Procedure
Update the
Kialiresourcespecconfiguration for tracing:Example
Kialiresourcespecconfiguration for tracing:spec: external_services: tracing: enabled: true provider: tempo use_grpc: false internal_url: https://tempo-sample-gateway.tempo.svc.cluster.local:8080/api/traces/v1/default/tempo external_url: https://tempo-sample-gateway-tempo.apps-crc.testing/api/traces/v1/default/search 1 health_check_url: https://tempo-sample-gateway-tempo.apps-crc.testing/api/traces/v1/default/tempo/api/echo auth: 2 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt insecure_skip_verify: false type: bearer use_kiali_token: true tempo_config: url_format: "jaeger"-
spec.external_services.tracing.enabledspecifies whether you have enabled tracing. -
spec.external_services.tracing.providerspecifies either distributed tracing platform (Tempo) or distributed tracing platform (Jaeger). The distributed tracing platform can expose a Jaeger API or a Tempo API. -
spec.external_services.tracing.internal_urlspecifies the internal URL for the Tempo API. When you deploy the distributed tracing platform in multitenancy, include the tenant name in the URL path of theinternal_urlparameter. In this example,defaultrepresents the tenant name. -
spec.external_services.tracing.external_urlspecifies the external URL for the Jaeger UI. When you deploy the distributed tracing platform in multitenancy, the gateway creates the route. Otherwise, you must create the route in theTemponamespace. You can manually create the route for thetempo-sample-query-frontendservice or update theTempocustom resource with.spec.template.queryFrontend.jaegerQuery.ingress.type: route. -
spec.external_services.tracing.health_check_urlspecifies the health check URL. Not required by default. When you deploy the distributed tracing platform in multitenancy, it does not expose the default health check URL. This is an example of a valid health URL. -
spec.external_services.tracing.authspecifies the configuration used when the access URL isHTTPSor requires authentication. Not required by default. -
spec.external_services.tracing.tempo_config.url_formatspecifies the configuration that defaults tografana. Not required by default. Change tojaegerif the KialiView in tracinglink redirects to the Jaeger console UI.
-
-
Save the updated
specinkiali_cr.yaml. Run the following command to apply the configuration:
$ oc patch -n istio-system kiali kiali --type merge -p "$(cat kiali_cr.yaml)"
Example output:
kiali.kiali.io/kiali patched
Verification
Run the following command to get the Kiali route:
$ oc get route kiali ns istio-system
- Navigate to the Kiali UI.
- Navigate to Workload → Traces tab to see traces in the Kiali UI.
4.1.6. External Kiali deployment model
Large mesh deployments can separate mesh operation from mesh observability by deploying Kiali away from the mesh. This separation provides dedicated management of observability, reduced resource consumption on mesh clusters, centralized visibility, and improved security isolation.
The external deployment model requires a minimum of two clusters:
- Management cluster: The home cluster where you deploy Kiali.
- Mesh clusters: The remote clusters where you deploy the service mesh.
In this model, Kiali is not co-located with an Istio control plane. You can also colocate other observability tools, such as Prometheus, on the management cluster to improve metric query performance.
4.1.6.1. Installing Kiali Operator on remote clusters
In an external deployment, you must install the Kiali Operator on all clusters, including those where Kiali is not deployed, to ensure the creation of required namespaces and remote cluster resources.
Prerequisites
-
You have logged in to the OpenShift Container Platform web console as a user with the
cluster-adminrole. - You have Istio installed in a multi-cluster configuration on each cluster.
- You have configured a metrics store so that Kiali can query metrics from all the clusters. Kiali queries metrics and traces from their endpoints.
- You have the necessary secrets for Kiali to access remote clusters.
-
You have set the
clustering.ignore_home_clusterfield totruein theKialicustom reource (CR). -
You have given a unique cluster name for the Kiali home cluster in
.spec.kubernetes_config.cluster_namespecification. In an external deployment, you must manually set this name because there is no colocated Istio control plane to offer it.
Procedure
- Deploy the Kiali Operator on all clusters using the procedure "Installing Kiali in a multi-cluster mesh".
For clusters where Kiali is not deployed, configure the
KialiCR to create only the remote cluster resources by setting thespec.deployment.remote_cluster_resources_onlyfield totrue, similar to the following example:apiVersion: kiali.io/v1alpha1 kind: Kiali metadata: name: kiali namespace: istio-system spec: version: default auth: deployment: remote_cluster_resources_only: trueEnsure the Kiali namespace and instance name are consistent across all clusters. If you change the default namespace (
istio-system) or instance name (kiali), you must apply the same values to the followingKialiCR settings on every cluster:-
spec.deployment.namespace -
spec.deployment.instance_name
-
4.1.7. Additional resources
4.2. OpenShift Service Mesh Console plugin
The OpenShift Service Mesh Console (OSSMC) plugin extends the OpenShift Container Platform web console with a Service Mesh menu and enhanced tabs for workloads and services.
4.2.1. About OpenShift Service Mesh Console plugin
The OpenShift Service Mesh Console (OSSMC) plugin is an extension to OpenShift Container Platform web console that provides visibility into your Service Mesh.
The OSSMC plugin supports only one Kiali instance, regardless of its project access scope.
The OSSMC plugin provides a new category, Service Mesh, in the main OpenShift Container Platform web console navigation with the following menu options:
- Overview
- Provides a summary of your mesh, displayed as cards that represent the namespaces in the mesh.
- Traffic Graph
- Provides a full topology view of your mesh, represented by nodes and edges. Each node represents a component of the mesh and each edge represents traffic flowing through the mesh between components.
- Istio config
- Provides a list of all Istio configuration files in your mesh, with a column that provides a quick way to know if the configuration for each resource is valid.
- Mesh
- Provides detailed information about the Istio infrastructure status. It shows an infrastructure topology view with core and add-on components, their health, and how they connect to each other.
In the web console Workloads details page, the OSSMC plugin adds a Service Mesh tab that has the following subtabs:
- Overview
- Shows a summary of the selected workload, including a localized topology graph showing the workload with all inbound and outbound edges and nodes.
- Traffic
- Shows information about all inbound and outbound traffic to the workload.
- Logs
- Shows the logs for the workload’s containers. You can see container logs individually ordered by log time and how the Envoy sidecar proxy logs relate to your workload’s application logs. You can enable tracing span integration to see logs that correspond to specific trace spans.
- Metrics
- Shows inbound and outbound metric graphs in the corresponding subtabs. All the workload metrics are here, providing a detailed view of the performance of your workload. You can enable the tracing span integration to see spans that occurred at the same time as the metrics. With the span marker in the graph, you can see the specific spans associated with that time frame.
- Traces
- Provides a chart showing the trace spans collected over the given time frame. The trace spans show the lowest-level detail within your workload application. The trace details further show heatmaps that offer a comparison of one span in relation to other requests and spans in the same time frame.
- Envoy
- Shows information about the Envoy sidecar configuration.
In the web console Networking details page, the OSSMC plugin adds a Service Mesh tab similar to the Workloads details page.
In the web console Projects details page, the OSSMC plugin adds a Service Mesh tab that provides traffic graph information about that project. It is the same information shown in the Traffic Graph page but specific to that project.
4.2.2. About installing OpenShift Service Mesh Console plugin
Install the OSSMC plugin by creating an OSSMConsole resource with the Kiali Operator to enable integrated service mesh management within the OpenShift console.
You must install the latest version of the Kiali Operator, even while installing a earlier OSSMC plugin version, because it includes the latest z-stream release.
- OSSM version compatibility
| 3.1 | v2.11 | v2.11 | 4.16+ |
|---|---|---|---|
| 3.0 | v2.4 | v2.4 | 4.15+ |
| 2.6 | v1.73 | v1.73 | 4.15-4.18 |
| 2.5 | v1.73 | v1.73 | 4.14-4.18 |
You can install the OSSMC plugin by using the OpenShift Container Platform web console or the OpenShift CLI (oc).
OSSMC plugin is only supported on OpenShift Container Platform 4.15 and above. For OpenShift Container Platform 4.14 users, only the standalone Kiali console is accessible.
4.2.2.1. Installing OSSMC plugin by using the OpenShift Container Platform web console
You can install the OpenShift Service Mesh Console (OSSMC) plugin by using the OpenShift Container Platform web console.
Prerequisites
- You have the administrator access to the OpenShift Container Platform web console.
- You have installed the OpenShift Service Mesh (OSSM).
-
You have installed the
Istiocontrol plane from OSSM 3.0. - You have installed the Kiali Server 2.4.
Procedure
- Navigate to Installed Operators.
- Click Kiali Operator provided by Red Hat.
- Click Create instance on the Red Hat OpenShift Service Mesh Console tile. You can also click Create OSSMConsole button under the OpenShift Service Mesh Console tab.
Use the Create OSSMConsole form to create an instance of the
OSSMConsolecustom resource (CR). Name and Version are the required fields.NoteThe Version field must match with the
spec.versionfield in your Kiali custom resource (CR). If Version value is the stringdefault, the Kiali Operator installs OpenShift Service Mesh Console (OSSMC) with the same version as the operator. Thespec.versionfield requires thevprefix in the version number. The version number must only include the major and minor version numbers (not the patch number); for example:v1.73.- Click Create.
Verification
- Wait for the web console to confirm the OSSMC plugin installation and prompt you to refresh.
- Verify that the Service Mesh category shows up in the main OpenShift Container Platform web console navigation.
4.2.2.2. Installing OSSMC plugin by using the CLI
You can install the OpenShift Service Mesh Console (OSSMC) plugin by using the OpenShift CLI.
Prerequisites
-
You have access to the OpenShift CLI (
oc) on the cluster as an administrator. - You have installed the OpenShift Service Mesh (OSSM).
-
You have installed the
Istiocontrol plane from OSSM 3.0. - You have installed the Kiali Server 2.4.
Procedure
Create a
OSSMConsolecustom resource (CR) to install the plugin by running the following command:$ cat <<EOM | oc apply -f - apiVersion: kiali.io/v1alpha1 kind: OSSMConsole metadata: namespace: openshift-operators name: ossmconsole spec: version: default EOM
NoteThe OpenShift Service Mesh Console (OSSMC) version must match with the Kiali Server version. If
spec.versionfield value is the stringdefaultor is not specified, the Kiali Operator installs OSSMC with the same version as the operator. Thespec.versionfield requires thevprefix in the version number. The version number must only include the major and minor version numbers (not the patch number); for example:v1.73.The plugin resources deploy in the same namespace as the
OSSMConsoleCR.Optional: If you installed more than one Kiali Server in the cluster, specify the
spec.kialisetting in theOSSMConsoleCR similar to the following example:$ cat <<EOM | oc apply -f - apiVersion: kiali.io/v1alpha1 kind: OSSMConsole metadata: namespace: openshift-operators name: ossmconsole spec: kiali: serviceName: kiali serviceNamespace: istio-system-two servicePort: 20001 EOM
Verification
- Go to the OpenShift Container Platform web console.
- Verify that the Service Mesh category shows up in the main OpenShift Container Platform web console navigation.
- Wait for the web console to confirm the OSSMC plugin installation and prompt you to refresh.
4.2.2.3. About uninstalling OpenShift Service Mesh Console plugin
You can uninstall the OSSMC plugin by using the OpenShift Container Platform web console or the OpenShift CLI (oc).
You must uninstall the OSSMC plugin before removing the Kiali Operator. Deleting the Operator first might leave OSSMC and Kiali CRs stuck, requiring manual removal of the finalizer. Use the following command with <custom_resource_type> as kiali or ossmconsole to remove the finalizer, if needed:
$ oc patch <custom_resource_type> <custom_resource_name> -n <custom_resource_namespace> -p '{"metadata":{"finalizers": []}}' --type=merge4.2.2.4. Uninstalling OSSMC plugin by using the web console
You can uninstall the OpenShift Service Mesh Console (OSSMC) plugin by using the OpenShift Container Platform web console.
Procedure
- Navigate to Installed Operators.
- Click Kiali Operator.
- Select the OpenShift Service Mesh Console tab.
- Click Delete OSSMConsole option from the entry menu.
- Confirm that you want to delete the plugin.
4.2.2.5. Uninstalling OSSMC plugin by using the CLI
You can uninstall the OpenShift Service Mesh Console (OSSMC) plugin by using the OpenShift CLI (oc).
Procedure
Remove the OSSMC custom resource (CR) by running the following command:
$ oc delete ossmconsoles <custom_resource_name> -n <custom_resource_namespace>
Verification
Verify that you deleted all the CRs from all namespaces by running the following command:
$ for r in $(oc get ossmconsoles --ignore-not-found=true --all-namespaces -o custom-columns=NS:.metadata.namespace,N:.metadata.name --no-headers | sed 's/ */:/g'); do oc delete ossmconsoles -n $(echo $r|cut -d: -f1) $(echo $r|cut -d: -f2); done