How to use vector tap and vector top for troubleshooting in RHOCP 4
Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content
Goal
The goal of this article is to show how to use vector top and vector tap commands for troubleshooting issues related to Vector when used in the Red Hat OpenShift Logging Stack.
Basic Concepts
Vector has three Components:
Sourcesfor indicating from where ingesting the data.
Vector supports a large set of Content from vector.dev is not included.Sources. The supported by the Red Hat OpenShift Logging product are limited to read the infrastructure logs, audit logs and pod logs from an OpenShift clusterTransforms: this component is for parsing, filtering, sampling, or aggregating.
Vector supports a large set of Content from vector.dev is not included.transformations. The supported by the Red Hat OpenShift Logging product are only the documented with the product.Sinksor destination for the events.
Vector supports a large set of Content from vector.dev is not included.sinks. The supported by the Red Hat OpenShift Logging product are only the documented with the product as documented.
vector top
vector top is a command that displays metrics as well as the topology information. To use it, run vector top inside a collector pod.
This command will allow to:
- Check if data from a source are read. For it, identify in the column
Kindthesourcedesired and see the columnsEvents InandBytes In. - Check if events are log forwarded to an output or destination. For it, identify in the column
Kindthesinkdesired and see the columnsEvents OutandBytes Out - If errors are thrown in the
sources,transformsorsinkscomponents while running the command. They are visible in the columnErrors - Identify the name of the different
source,sink,transformsobtained from the columnID. This will be helpful to be used with the commandvector tap
As the metrics observed are in real time, if it's seen a value, it indicates that reading, log forwarding, transforming or errors present.
Example of the command execution:
$ oc -n <namespace> rsh <collector pod>
sh-5.1# vector top
Example of output obtained:
vector tap
Allows to observe events as they flow to and from the different Vector components: sources, transforms, sinks in the pipeline defined.
vector tap command can be used without any option:
$ oc -n <namespace> rsh <collector pod>
sh-5.1# vector tap
[tap] Pattern '*' successfully matched.
[tap] Warning: sink outputs cannot be tapped. Output pattern '*' matches sinks ["output_default_loki_infra", "prometheus_output", "output_default_loki_apps"]
Options available:
sh-5.1# vector tap --help
Observe output log events from source or transform components. Logs are sampled at a specified interval
Usage: vector tap [OPTIONS] [COMPONENT_ID_PATTERNS]...
Arguments:
[COMPONENT_ID_PATTERNS]... Components IDs to observe (comma-separated; accepts glob patterns)
Options:
-i, --interval <INTERVAL> Interval to sample logs at, in milliseconds [default: 500]
-u, --url <URL> GraphQL API server endpoint
-l, --limit <LIMIT> Maximum number of events to sample each interval [default: 100]
-f, --format <FORMAT> Encoding format for events printed to screen [default: json] [possible values:
json, yaml, logfmt]
--outputs-of <OUTPUTS_OF> Components (sources, transforms) IDs whose outputs to observe (comma-separated;
accepts glob patterns)
--inputs-of <INPUTS_OF> Components (transforms, sinks) IDs whose inputs to observe (comma-separated;
accepts glob patterns)
-q, --quiet Quiet output includes only events
-m, --meta Include metadata such as the event's associated component ID
-n, --no-reconnect Whether to reconnect if the underlying API connection drops. By default, tap
will attempt to reconnect if the connection drops
-h, --help Print help
Those of more interest are --outputs-of, inputs-of and -m. The options --outputs-of and inputs-of can be used with * for matching all the Components ID available or it can be an specific Components IDs:
sh-5.1# vector tap --outputs-of *
sh-5.1# vector tap --outputs-of <component ID>
sh-5.1# vector tap --inputs-of *
sh-5.1# vector tap --outputs-of <component ID>
Troubleshooting example
Let's create the application hello-node in the namespace taptest that will run in the node worker1.example.com
$ oc new-project taptest
$ kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname
Let's generate the log entry Hello world, troubleshooting the collector with tap in the hello-node pod each 30 seconds
$ oc get pods -n taptest -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-node-595bfd9b77-kxdjr 1/1 Running 0 2m36s 10.131.3.51 worker1.example.com <none> <none>
$ pod=$(oc -n taptest get pod -l app=hello-node -o name)
$ oc -n taptest rsh $pod
~ $ while true; do sleep 30 ; echo "Hello world, troubleshooting the collector with tap" > /proc/1/fd/1; done
In a different terminal. Let's identify the collector pod running in the same node:
// In Logging 5
$ oc get pods -l component=collector -n <namespace> -o wide |grep worker1.example.com
// In Logging 6
$ oc get pods -l app.kubernetes.io/component=collector -n <namespace> |grep worker1.example.com
Let's enter in the collector pod and identify the Components ID (See the section "How to obtain the Vector component ID" for more details) for the application logs:
$ oc -n <namespace> rsh <collector pod>
// Obtain the Component sources ID. In this example is `input_application_container` for the application logs
sh-5.1# grep "\[sources" /etc/vector/vector.toml
[sources.internal_metrics]
[sources.input_application_container]
[sources.input_infrastructure_container]
[sources.input_infrastructure_journal]
// Obtain the Component sinks ID. In this example is `output_default_loki_apps` as sending to the Red Hat Managed Loki the application logs
sh-5.1# grep "\[sinks" /etc/vector/vector.toml
[sinks.output_default_loki_apps]
[sinks.output_default_loki_apps.encoding]
[sinks.output_default_loki_apps.labels]
[sinks.output_default_loki_apps.tls]
[sinks.output_default_loki_apps.auth]
[sinks.output_default_loki_infra]
[sinks.output_default_loki_infra.encoding]
[sinks.output_default_loki_infra.labels]
[sinks.output_default_loki_infra.tls]
[sinks.output_default_loki_infra.auth]
[sinks.prometheus_output]
[sinks.prometheus_output.tls]
Let's verify if Vector is reading the logs from the container pod hello-node* examining the outputs of input_application_container:
sh-5.1# vector tap --outputs-of input_application_container > /tmp/vector_tap_sources
ctrl+c
sh-5.1# grep "Hello world, troubleshooting the collector with tap" /tmp/vector_tap_sources | head -1
{"file":"/var/log/pods/taptest_hello-node-595bfd9b77-kxdjr_132094be-fd93-4c12-9087-94508eb9a4a6/agnhost/0.log","hostname":"worker1.example.com","kubernetes":{"annotations":{"k8s.ovn.org/pod-networks":"{\"default\":{\"ip_addresses\":[\"10.x.x.x/x\"],\"mac_address\":\"0a:58:0a:83:03:33\",\"gateway_ips\":[\"10.x.x.x\"],\"routes\":[{\"dest\":\"10.x.x.x/14\",\"nextHop\":\"10.x.x.x\"},{\"dest\":\"172.x.x.x/16\",\"nextHop\":\"10.x.x.x\"},{\"dest\":\"100.x.x.x/16\",\"nextHop\":\"10.x.x.x\"}],\"ip_address\":\"10.x.x.x/23\",\"gateway_ip\":\"10.x.x.x\"}}","k8s.v1.cni.cncf.io/network-status":"[{\n \"name\": \"ovn-kubernetes\",\n \"interface\": \"eth0\",\n \"ips\": [\n \"10.x.x.x\"\n ],\n \"mac\": \"0a:58:0a:83:03:33\",\n \"default\": true,\n \"dns\": {}\n}]","openshift.io/scc":"restricted-v2","seccomp.security.alpha.kubernetes.io/pod":"runtime/default"},"container_id":"cri-o://f09f800cb873238bbad3e3709f6276bb4d87c5e7d101b4e9c8904ad2ef13d982","container_image":"registry.k8s.io/e2e-test-images/agnhost:2.43","container_image_id":"registry.k8s.io/e2e-test-images/agnhost@sha256:16bbf38c463a4223d8cfe4da12bc61010b082a79b4bb003e2d3ba3ece5dd5f9e","container_name":"agnhost","labels":{"app":"hello-node","pod-template-hash":"595bfd9b77"},"namespace_id":"f15d74c2-c5bc-4bf0-ad44-ae9d8d3b104c","namespace_labels":{"kubernetes.io/metadata.name":"taptest","pod-security.kubernetes.io/audit":"restricted","pod-security.kubernetes.io/audit-version":"v1.24","pod-security.kubernetes.io/warn":"restricted","pod-security.kubernetes.io/warn-version":"v1.24"},"namespace_name":"taptest","node_labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/instance-type":"vsphere-vm.cpu-4.mem-16gb.os-unknown","beta.kubernetes.io/os":"linux","failure-domain.beta.kubernetes.io/region":"redhat-region","failure-domain.beta.kubernetes.io/zone":"redhat-zone-a","ingresscontroller":"default","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"worker1.example.com","kubernetes.io/os":"linux","node-role.kubernetes.io/worker":"","node.kubernetes.io/instance-type":"vsphere-vm.cpu-4.mem-16gb.os-unknown","node.openshift.io/os_id":"rhcos","topology.csi.vmware.com/openshift-region":"redhat-region","topology.csi.vmware.com/openshift-zone":"redhat-zone-a","topology.kubernetes.io/region":"redhat-region","topology.kubernetes.io/zone":"redhat-zone-a"},"pod_id":"132094be-fd93-4c12-9087-94508eb9a4a6","pod_ip":"10.x.x.x","pod_ips":["10.x.x.x"],"pod_name":"hello-node-595bfd9b77-kxdjr","pod_owner":"ReplicaSet/hello-node-595bfd9b77"},"message":"Hello world, troubleshooting the collector with tap","source_type":"kubernetes_logs","stream":"stdout","timestamp":"2024-10-03T15:21:24.286079141Z"}
Let's verify if the Vector sink is receiving the events for being sent examining the inputs of output_default_loki_apps:
sh-5.1# vector tap --inputs-of output_default_loki_apps > /tmp/vector_tap_sinks
ctrl+c
sh-5.1# grep "Hello world, troubleshooting the collector with tap" /tmp/vector_tap_sinks | head -1
{"file":"/var/log/pods/taptest_hello-node-595bfd9b77-kxdjr_132094be-fd93-4c12-9087-94508eb9a4a6/agnhost/0.log","hostname":"worker1.example.com","kubernetes":{"container_name":"","namespace_name":"","pod_name":""},"level":"default","log_type":"application","message":"Hello world, troubleshooting the collector with tap","openshift":{"cluster_id":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx","sequence":1727969368254938317}}
For getting the files outside of the pod for a posterior analysis use the command:
$ oc -n <namespace> exec <collector pod> -- cat /path/to/file > /path/to/destination
For the example here where created the files /tmp/vector_tap_sinks and /tmp/vector_tap_sources:
$ oc -n <namespace> exec <collector pod> -- cat /tmp/vector_tap_sinks > /tmp/vector_tap_sinks
$ oc -n <namespace> exec <collector pod> -- cat /tmp/vector_tap_sources > /tmp/vector_tap_sources
How to obtain the Vector component ID
Let's explain three different ways:
- As it was indicated in this article before, it can be obtained with the command
vector top. Read the sectionvector topfor more details. - From a Vector pod running
- From the Vector secret
Obtaining Vector component ID from a Vector pod running
$ oc -n <namespace> rsh <collector pod>
// Get the Sources
$ grep "\[sources" /etc/vector/vector.toml
// Get the Transforms
$ grep -i "\[transforms" /etc/vector/vector.toml
// Get the Sinks
$ grep -i "\[sinks" /etc/vector/vector.toml
Obtaining Vector component ID from the Vector secret
Let's explain how to obtain them from the Vector secret. The Vector configuration is stored in a secret in the same namespace that the Vector pods are running.
This secret follows the pattern name <CR name>-config. If the CR is called collector or instance, the secret containing the Vector running configuration will be collector-config.
Get the CR name:
/// In Logging 5
$ oc get clusterlogging -n <namespace>
// In Logging 6
$ oc get obsclf -n <namespace>
Get the vector configuration from the secret where <CR name> is the name obtained from the previous command:
Warning: the vector.toml can contain private data. Be sure that the
vector.tomlfile saved is only readable by the user running the commands
$ oc get secret <CR name>-config -n <namespace> -o jsonpath='{.data.vector\.toml}' |base64 -d > vector.toml
or
$ oc extract secret/<CR name>-config --keys="vector.toml" -n <namespace>
Let's get the component ID:
// Get the Sources
$ grep "\[sources" vector.toml
// Get the Transforms
$ grep -i "\[transforms" vector.toml
// Get the Sinks
$ grep -i "\[sinks" vector.toml
Other links of interest
Content from vector.dev is not included.Introducing vector top
Content from vector.dev is not included.Introducing vector tap
Content from vector.dev is not included.Using vector tap