Alerts in the Telemetry Operator

Updated 29 Oct 2025

You create alert rules in Prometheus. The Prometheus servers send the alerts to an Alertmanager instance that manages the alerts. You create alert routes in Alertmanager to silence, inhibit, or aggregate alerts, and send notifications by using email, on-call notification systems, or chat platforms.

To create an alert, complete the following tasks:

Enable Alertmanager for Telemetry Operator.
Create an alert rule in Prometheus.
Create an alert route in Alertmanager.

Prerequisites

The Red Hat OpenStack Services on OpenShift (RHOSO) environment is deployed on a Red Hat OpenShift Container Platform (RHOCP) cluster. For more information, see Deploying Red Hat OpenStack Services on OpenShift.
You are logged on to a workstation that has access to the RHOCP cluster, as a user with cluster-admin privileges.
The Telemetry service is enabled and configured on the control plane. For more information, see the telemetry configuration in Creating the control plane.

Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications. If a rule condition returns an empty result set, the condition is false and no alert notification is triggered. If a rule condition returns a set of results, the condition is true and Prometheus triggers an alert notification.

Procedure

Create a file on your workstation that defines a PrometheusRule CR that contains the alert rules, for example, openstack-observability-services-alerts.yaml and define the alert rules required for your environment. The following example defines four sample rules you can use to get an alert if a component is down in your deployment:

apiVersion: monitoring.rhobs/v1
kind: PrometheusRule
metadata:
  labels:
    service: metricStorage
  name: openstack-observability-services-alerts
  namespace: openstack
spec:
  groups:
    - name: openstack-observability.services.status
      rules:
      - alert: OpenStackServicesDownWarning
        expr: |
          (
            kube_pod_info{created_by_kind=~"ReplicaSet|StatefulSet|OpenStackClient"} * on(uid) group_left(phase) (kube_pod_status_phase{phase!="Running"} == 1)
          )
        for: 10s
        labels:
          severity: warning
        annotations:
          summary: "OpenStack Service pod not running (warning)"
          description: |
            Pod {{ $labels.pod }} (controlled by {{ $labels.created_by_name }}) in namespace {{ $labels.namespace }} is not Running.
            Current state: {{ $labels.phase }}
            This has been the case for more than 10 seconds.
      - alert: OpenStackServicesDownCritical
        expr: |
          (
            kube_pod_info{created_by_kind=~"ReplicaSet|StatefulSet|OpenStackClient"} * on(uid) group_left(phase) (kube_pod_status_phase{phase!="Running"}) == 1)
          )
        for: 60s
        labels:
          severity: critical
        annotations:
          summary: "OpenStack Service pod not running (critical)"
          description: |
            Pod {{ $labels.pod }} (controlled by {{ $labels.created_by_name }}) in namespace {{ $labels.namespace }} is not Running.
            Current state: {{ $labels.phase }}
            This has been the case for more than 1 minute.
    - name: openstack-observability.scrapeconfig.status
      rules:
      - alert: OpenStackObservabilityDownWarning
        expr: |
          (
            up{job=~"scrapeConfig/openstack/telemetry-(ceilometer|podman-exporter|node-exporter)"} == 0
            or up{service="metric-storage-prometheus"} == 0
          )
        for: 10s
        labels:
          severity: warning
        annotations:
          summary: "Telemetry component down (warning)"
          description: |
            One of the RHOSO observability scrapeconfigs is down for more than 10 seconds.
            Instance: {{ $labels.instance }}
            Job: {{ $labels.job }}
      - alert: OpenStackObservabilityDownCritical
        expr: |
          (
            up{job=~"scrapeConfig/openstack/telemetry-(ceilometer|podman-exporter|node-exporter)"} == 0
            or up{service="metric-storage-prometheus"} == 0          
          )
        for: 60s
        labels:
          severity: critical
        annotations:
          summary: "Telemetry component down (critical)"
          description: |
            One of the RHOSO observability scrapeconfigs is down for more than 1 minute.
            Instance: {{ $labels.instance }}
            Job: {{ $labels.job }}

metadata.labels.service: This field must be set to metricStorage to ensure that the Prometheus instance managed by telemetry-operator loads the rule and that the rule becomes visible in the Prometheus dashboard.

For more information about how to configure alerting rules, see Content from prometheus.io is not included.Alerting rules.

Create the PrometheusRule object:
```
$ oc create -f openstack-observability-services-alerts.yaml
```
The Cluster Observability Operator (COO) loads the rule into Prometheus.
Verify that the COO loaded the rules into Prometheus:
```
$ oc get prometheusrules.monitoring.rhobs -n openstack
```
NOTE: You must pass the entire CRD name, prometheusrules.monitoring.rhobs, because there is a different PrometheusRule CRD that provides the rules for the RHOCP Monitoring API, prometheusrules.monitoring.coreos.
Optional: Expose the Prometheus and Alertmanager services to access the Prometheus and Alertmanager dashboards:
```
$ oc expose svc metric-storage-prometheus
$ oc expose svc metric-storage-alertmanager
```
NOTE: There is no access control in front of the Prometheus and Alertmanager dashboards. Exposing the services allows for anybody with the route to access to your Prometheus and Alertmanager dashboards.

Creating an alert route in Alertmanager

You can configure the Telemetry Operator Alertmanager instance to deliver alert notifications to an external system, such as email, IRC, or other notification channel. The Telemetry Operator does not deploy any external notifications by default. To send alert notifications to an external system, you create a Red Hat OpenShift Container Platform (RHOCP) secret that contains the configuration to be used by Alertmanager. You can also create Alertmanager configuration by using templates that are stored locally and referenced from the Secret along with any native Alertmanager configuration files.

Procedure

Create a file on your workstation named alertmanager.yaml that contains the native Alertmanager configuration you want to use. For example, the following configuration sends notifications to a webhook service:
```
route:
  group_by: ['job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'webhook'
receivers:
- name: 'webhook'
  webhook_configs:
  - url: 'http://example.com/'
```
For information about how to configure Alertmanager, see Content from prometheus.io is not included.Configuration.
Create a file on your workstation named alertmanager-metric-storage-secret.yaml to define the Alertmanager secret:
```
apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-metric-storage
  namespace: openstack
type: Opaque
data:
  alertmanager.yaml: {BASE64_CONFIG}
```
- metadata.name: The Secret name must follow the naming convention format, alertmanager-<name_of_Alertmanager_resource>, in order for the Alertmanager to pick up the configuration.
- Replace {BASE64_CONFIG} with the base64-encoded Alertmanager config. You can use the following command to generate a base64-encoded config:
```
$ cat alertmanager.yaml | base64 -w 0    
```
Optional: You can create a local template that configures the layout and format of the alert notification, for instance, you can use a template to add a title and text to the alert. The template must use a base64-encoded configuration.
```
data:
  alertmanager.yaml: {BASE64_CONFIG}
  template_1.tmpl: {BASE64_TEMPLATE_1}
  template_2.tmpl: {BASE64_TEMPLATE_2}
```
For information about how to create Alertmanager templates, see the Content from golang.org is not included.Go template documentation and Content from prometheus.io is not included.Defining reusable templates.
Create the Secret CR to apply the Alertmanager configuration for the Alertmanager instance managed by the telemetry-operator:
```
$ oc apply -f alertmanager-metric-storage-secret.yaml -n openstack
```
Create a file on your workstation named alertmanager-metric-storage.yaml to define the Alertmanager CR:
```
apiVersion: monitoring.rhobs/v1
kind: Alertmanager
metadata:
  name: metric-storage
  namespace: openstack
spec:
  configSecret: alertmanager-metric-storage
```
- metadata.name: Specifies the name of the metricStorage Alertmanager instance.
- spec.configSecret: Specifies the name of the Secret CR that contains the alert routes configuration.
Apply the modified Alertmanager CR to the control plane:
```
$ oc apply -f alertmanager-metric-storage.yaml -n openstack --server-side
```
NOTE: You must include the --server-side flag to apply the Alertmanager configuration with Server-Side Apply (SSA) because the Alertmanager resource is managed by the Telemetry Operator. For more information about SSA with the Cluster Observability Operator, see Using Server-Side Apply to customize Prometheus resources.

Verify that the Alertmanager configuration is applied:

$ oc get alertmanager.monitoring.rhobs metric-storage -n openstack -o yaml --show-managed-fields | grep configSecret
    f:configSecret: {}
  configSecret: secret-alertmanager

Additional resources

SBR

Stack

Product(s)

Red Hat OpenStack Services on OpenShift

Category

Configure

Components

Tags

openstack

Article Type

General