How to collect cluster Prometheus metrics in Red Hat OpenShift Container Platform
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 3.11
- 4
- Prometheus
- Metrics
Issue
- How to collect cluster Prometheus metrics in Red Hat OpenShift Container Platform.
- How to provide specific time range from a Prometheus DB.
Resolution
Complete collection
-
One option to capture cluster prometheus metrics is with the following script. Once the capture of data is complete, please share this data with Red Hat Technical Support via a support case.
cat <<'EOF' > prometheus-metrics.sh #!/usr/bin/env bash function queue() { local TARGET="${1}" shift local LIVE LIVE="$(jobs | wc -l)" while [[ "${LIVE}" -ge 45 ]]; do sleep 1 LIVE="$(jobs | wc -l)" done echo "${@}" if [[ -n "${FILTER:-}" ]]; then "${@}" | "${FILTER}" >"${TARGET}" & else "${@}" >"${TARGET}" & fi } ARTIFACT_DIR=$PWD mkdir -p $ARTIFACT_DIR/metrics echo "Snapshotting prometheus (may take 15s) ..." queue ${ARTIFACT_DIR}/metrics/prometheus.tar.gz oc --insecure-skip-tls-verify exec -n openshift-monitoring prometheus-k8s-0 -- tar cvzf - -C /prometheus . FILTER=gzip queue ${ARTIFACT_DIR}/metrics/prometheus-target-metadata.json.gz oc --insecure-skip-tls-verify exec -n openshift-monitoring prometheus-k8s-0 -- /bin/bash -c "curl -G http://localhost:9090/api/v1/targets/metadata --data-urlencode 'match_target={instance!=\"\"}'" wait EOF bash prometheus-metrics.sh
The script above may not work completely, as the generated tar file might fail to extract because the script copies files while Prometheus is actively working on them. Using the collection method below might be a better option.
Partial collection
- If a specific time range from your Prometheus database needs to be selected, use the following commands to list the Block ULID, copy the directories, and then compress the folders.
- Get the list of time block chunks you wish to review:
$ oc exec -n openshift-monitoring prometheus-k8s-0 -- promtool tsdb list -r /prometheus
BLOCK ULID MIN TIME MAX TIME DURATION NUM SAMPLES NUM CHUNKS NUM SERIES SIZE
01GGQV2KWQ7DX0RAHWZZFPCNTM 2022-10-31 17:17:39 +0000 UTC 2022-10-31 18:00:00 +0000 UTC 42m20.389s 15306022 215664 215457 46MiB747KiB476B
01GGRRZ7B4VMK4PGENCKC642FK 2022-10-31 18:00:00 +0000 UTC 2022-11-01 00:00:00 +0000 UTC 5h59m59.811s 133853179 1122268 195484 161MiB1016KiB281B
01GGSDJDEDKAVW84H85123N5DW 2022-11-01 00:00:00 +0000 UTC 2022-11-01 06:00:00 +0000 UTC 5h59m59.811s 135702215 1140143 199963 166MiB899KiB888B
01GGT25KJ6HQ0SK622J57PNZ4M 2022-11-01 06:00:00 +0000 UTC 2022-11-01 12:00:00 +0000 UTC 5h59m59.811s 135950916 1147756 230258 190MiB936KiB527B
01GGTPRRYWQXTNE8C4D3KD40WF 2022-11-01 12:00:00 +0000 UTC 2022-11-01 18:00:00 +0000 UTC 5h59m59.811s 135618858 1169963 224173 178MiB160KiB467B
01GGVBBYM6R3MV21Q84KCAAMMH 2022-11-01 18:00:00 +0000 UTC 2022-11-02 00:00:00 +0000 UTC 5h59m59.811s 139185809 1165518 201794 172MiB206KiB895B
01GGVZZ4CDD2FN548AZ5QWHYSF 2022-11-02 00:00:00 +0000 UTC 2022-11-02 06:00:00 +0000 UTC 5h59m59.811s 139566486 1171364 204422 169MiB81KiB250B
01GGWMJBTHJEH4VQBC2361RTEH 2022-11-02 06:00:00 +0000 UTC 2022-11-02 12:00:00 +0000 UTC 5h59m59.811s 140284758 1175662 203168 172MiB559KiB886B
- From the previous output, update the below script (or execute the lines sequentially) to pull the blocks and chunk data for analysis (you must set the values for "blocks" to match your output first).
#!/bin/bash
## IMPORTANT: set the block UUIDs to be copied first in a space-delimited list (depends upon your output above)
blocks="01GGQV2KWQ7DX0RAHWZZFPCNTM 01GGRRZ7B4VMK4PGENCKC642FK"
CAPTUREDIR=./data
mkdir -p $CAPTUREDIR
for i in $(echo $blocks); do mkdir -p $CAPTUREDIR/$i/chunks; done
for i in $(echo $blocks); do for file in index meta.json tombstones; do oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- cat /prometheus/$i/$file > $CAPTUREDIR/$i/$file; done; done
for i in $(echo $blocks); do oc cp -n openshift-monitoring prometheus-k8s-0:/prometheus/$i -c prometheus $CAPTUREDIR/$i; done
oc cp -n openshift-monitoring -c prometheus prometheus-k8s-0:chunks_head $CAPTUREDIR/chunks_head
#Note capturing Wal segment may report "file changed as we read it" - we can ignore this error
oc cp -n openshift-monitoring -c prometheus prometheus-k8s-0:wal $CAPTUREDIR/wal
oc cp -n openshift-monitoring -c prometheus prometheus-k8s-0:queries.active $CAPTUREDIR/queries.active
tar -zcvf prometheus-db.tar.gz ${CAPTUREDIR}
We should see in the output of these directories the following files:
~/Downloads/data tree
.
├── 01K94F3NNVR4YG19FB8E1YDZ7K # chunk directory for specific time selection
│ ├── chunks
│ │ └── 000001
│ ├── index
│ ├── meta.json
│ └── tombstones
├── 01K94NZZ6M1194YNR4KBFCFF27
│ ├── chunks
│ │ ├── 000001
│ │ └── 000002
│ ├── index
│ ├── meta.json
│ └── tombstones
├── chunks_head
├── queries.active
└── wal
└── 00000000
Ensure that we aren't missing the index, meta.json or tombstones file for a given chunk directory - without these the chunk cannot be parsed, and you may need to go back to pull these files explicitly out of the pod. chunks_head and queries.active and wal is also required to parse the chunk blocks.
Cluster Observability Operator
In the case of needing to collect a Prometheus dump from the Cluster Observability Operator, use the following commands:
The Cluster Observability Operator Prometheus instance pods do not contain the tar binary
$ datadir=./data
$ mkdir -p $datadir
## set the block ULIDs to be copied:
$ blocks="01J0T747YMVTQHT5AK122TD9K8 01J0T08GPMBY81J1SHHRYSS0GF 01J0SSCSEKP4RRR5MK9DQ9S103"
$ for i in $(echo $blocks); do mkdir -p $datadir/$i/chunks; done
$ for i in $(echo $blocks); do for file in index meta.json tombstones; do oc exec -n $NAMESPACE prometheus-coo-monitoring-stack-0 -c prometheus -- cat /prometheus/$i/$file > $datadir/$i/$file; done; done
$ for i in $(echo $blocks); do oc exec -n $NAMESPACE prometheus-coo-monitoring-stack-0 -c prometheus -- cat /prometheus/$i/chunks/000001 > $datadir/$i/chunks/000001; done
$ tar zcvf prometheus-db.tar.gz ${datadir}
Opening Prometheus data for offline review with podman
This can be accomplished using podman and a simple launch script:
$ cat prometheus_viewer.sh
#!/bin/bash
#prometheus launcher
#provided as-is with no warranty for use in troubleshooting or analysis of prometheus data bundles.
#
#script assumes you have podman installed, firefox is available, and that you can call the file ~/Downloads/data.
#script also assumes you have a pull-secret available to reference to acquire the correct runtime image.
#
#Running this script will prompt you first to confirm you have completed the requisite step of un-compressing
#the tarball of the promql output into your downloads directory, and then will prompt for the exact clusterversion
#after which it will kick-start a container and open a web browser session to the locally running prometheus instance.
##-----Script start-----##
echo "this script assumes you have already expanded a tarball of the prometheus-db.tar.gz file to the directory: ~/Downloads/data"
echo "press return to continue or ctrl + c to abort and do that first"
read emptyvar1
#ensure we match the same version of promql from openshift that was used to generate the data
echo "insert clusterversion for the bundle you are reviewing"
read OCP_VERSION
IMG="$(oc adm release info --image-for=prometheus ${OCP_VERSION})"
#load pull-secret (can be obtained from https://console.redhat.com/openshift/downloads)"
PULL_SECRET=<path-to-pull-secret.txt-here>
### Retrieve the image URL and run the container (here using `-ti` option to run it in foreground mode)
### The script below will assume that our data directory is in ~/Downloads and load $PWD/data after going there.
### You may need to adjust where the data directory is, and where the script takes you to execute below if different.
### Script will also try to launch firefox, change this to chrome or edge or your preferred browser executable
### Or just open your webrowser at that page.
#navigate to ~/Downloads so we can reference $PWD/data
cd ~/Downloads/
#open firefox so the page is available and load the container as a session in this shell in the foreground.
echo "launching firefox in a new window and starting the container below - press ctrl+c to stop the process and remove the container when finished"
firefox http://localhost:9090/ &
podman run --rm --authfile=${PULL_SECRET} -it -u $(id -u):$(id -g) -p 9090:9090 -v $PWD/data:/data:U,Z $IMG --storage.tsdb.path=/data --storage.tsdb.retention.time=999d --config.file=/dev/null
NOTE: Script requires a This content is not included.pull secret - edit the script above to include the path to this file locally on your machine before execution.
Usage:
- copy this script to your local machine that where
podmanand a local browser likefirefoxis available and un-compress the prometheus tarball in the path:~/Downloads(should result in the folder:~/Downloads/databeing created) - Modify the script to include a path to the local pull-secret file where the script can reference it
- Modify the script to select a different browser if not using firefox or to point to where the expanded
datafolder is. - Execute the script, pressing return to acknowledge the first message, then insert the exact clusterversion (4.18.24 for example) to pull that image version.
- Open your browser to http://localhost:9090 (if it does not automatically launch and wait for the container to finish setup - you may need to refresh the page once or twice before it will populate).
- Query for a wide time-range (7d or so) and search for a very generic query like the following in graph view:
sum(kube_node_status_condition{condition="Ready", status="true"}==1)#number of ready nodes in the cluster. - highlight the block where you see data in the graph to see the specific time segments you exported which will adjust your range to this timeframe only and then run the queries you want with the correctly scoped view.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.