How to use vector tap and vector top for troubleshooting in RHOCP 4

Updated

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content

Goal

The goal of this article is to show how to use vector top and vector tap commands for troubleshooting issues related to Vector when used in the Red Hat OpenShift Logging Stack.

Basic Concepts

Vector has three Components:

  1. Sources for indicating from where ingesting the data.
    Vector supports a large set of Content from vector.dev is not included.Sources. The supported by the Red Hat OpenShift Logging product are limited to read the infrastructure logs, audit logs and pod logs from an OpenShift cluster
  2. Transforms: this component is for parsing, filtering, sampling, or aggregating.
    Vector supports a large set of Content from vector.dev is not included.transformations. The supported by the Red Hat OpenShift Logging product are only the documented with the product.
  3. Sinks or destination for the events.
    Vector supports a large set of Content from vector.dev is not included.sinks. The supported by the Red Hat OpenShift Logging product are only the documented with the product as documented.

vector top

vector top is a command that displays metrics as well as the topology information. To use it, run vector top inside a collector pod.

This command will allow to:

  • Check if data from a source are read. For it, identify in the column Kind the source desired and see the columns Events In and Bytes In.
  • Check if events are log forwarded to an output or destination. For it, identify in the column Kind the sink desired and see the columns Events Out and Bytes Out
  • If errors are thrown in the sources, transforms or sinks components while running the command. They are visible in the column Errors
  • Identify the name of the different source, sink, transforms obtained from the column ID . This will be helpful to be used with the command vector tap

As the metrics observed are in real time, if it's seen a value, it indicates that reading, log forwarding, transforming or errors present.

Example of the command execution:

$ oc -n <namespace> rsh <collector pod> 
sh-5.1# vector top

Example of output obtained:

vector top output
vector top output

vector tap

Allows to observe events as they flow to and from the different Vector components: sources, transforms, sinks in the pipeline defined.

vector tap command can be used without any option:

$ oc -n <namespace> rsh <collector pod> 
sh-5.1# vector tap                                                                                                                                                                            
[tap] Pattern '*' successfully matched.
[tap] Warning: sink outputs cannot be tapped. Output pattern '*' matches sinks ["output_default_loki_infra", "prometheus_output", "output_default_loki_apps"]

Options available:

sh-5.1# vector tap --help
Observe output log events from source or transform components. Logs are sampled at a specified interval

Usage: vector tap [OPTIONS] [COMPONENT_ID_PATTERNS]...

Arguments:
  [COMPONENT_ID_PATTERNS]...  Components IDs to observe (comma-separated; accepts glob patterns)

Options:
  -i, --interval <INTERVAL>      Interval to sample logs at, in milliseconds [default: 500]
  -u, --url <URL>                GraphQL API server endpoint
  -l, --limit <LIMIT>            Maximum number of events to sample each interval [default: 100]
  -f, --format <FORMAT>          Encoding format for events printed to screen [default: json] [possible values:
                                 json, yaml, logfmt]
      --outputs-of <OUTPUTS_OF>  Components (sources, transforms) IDs whose outputs to observe (comma-separated;
                                 accepts glob patterns)
      --inputs-of <INPUTS_OF>    Components (transforms, sinks) IDs whose inputs to observe (comma-separated;
                                 accepts glob patterns)
  -q, --quiet                    Quiet output includes only events
  -m, --meta                     Include metadata such as the event's associated component ID
  -n, --no-reconnect             Whether to reconnect if the underlying API connection drops. By default, tap
                                 will attempt to reconnect if the connection drops
  -h, --help                     Print help

Those of more interest are --outputs-of, inputs-of and -m. The options --outputs-of and inputs-of can be used with * for matching all the Components ID available or it can be an specific Components IDs:

sh-5.1# vector tap --outputs-of *
sh-5.1# vector tap --outputs-of <component ID>
sh-5.1# vector tap --inputs-of *
sh-5.1# vector tap --outputs-of <component ID>

Troubleshooting example

Let's create the application hello-node in the namespace taptest that will run in the node worker1.example.com

$ oc new-project taptest
$ kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname

Let's generate the log entry Hello world, troubleshooting the collector with tap in the hello-node pod each 30 seconds

$ oc get pods -n taptest -o wide
NAME                          READY   STATUS    RESTARTS   AGE     IP            NODE                NOMINATED NODE   READINESS GATES
hello-node-595bfd9b77-kxdjr   1/1     Running   0          2m36s   10.131.3.51   worker1.example.com   <none>           <none>

$ pod=$(oc -n taptest get pod -l app=hello-node -o name)
$ oc -n taptest rsh $pod
~ $ while true; do sleep 30 ; echo "Hello world, troubleshooting the collector with tap" > /proc/1/fd/1; done

In a different terminal. Let's identify the collector pod running in the same node:

// In Logging 5
$ oc get pods -l component=collector -n <namespace> -o wide |grep worker1.example.com

// In Logging 6
$ oc get pods -l app.kubernetes.io/component=collector -n <namespace> |grep worker1.example.com

Let's enter in the collector pod and identify the Components ID (See the section "How to obtain the Vector component ID" for more details) for the application logs:

$ oc -n <namespace> rsh <collector pod>

// Obtain the Component sources ID. In this example is `input_application_container` for the application logs
sh-5.1# grep "\[sources" /etc/vector/vector.toml          
[sources.internal_metrics]
[sources.input_application_container]
[sources.input_infrastructure_container]
[sources.input_infrastructure_journal]


// Obtain the Component sinks ID. In this example is `output_default_loki_apps` as sending to the Red Hat Managed Loki the application logs
sh-5.1#  grep "\[sinks" /etc/vector/vector.toml      
[sinks.output_default_loki_apps]
[sinks.output_default_loki_apps.encoding]
[sinks.output_default_loki_apps.labels]
[sinks.output_default_loki_apps.tls]
[sinks.output_default_loki_apps.auth]
[sinks.output_default_loki_infra]
[sinks.output_default_loki_infra.encoding]
[sinks.output_default_loki_infra.labels]
[sinks.output_default_loki_infra.tls]
[sinks.output_default_loki_infra.auth]
[sinks.prometheus_output]
[sinks.prometheus_output.tls]

Let's verify if Vector is reading the logs from the container pod hello-node* examining the outputs of input_application_container:

sh-5.1# vector tap --outputs-of input_application_container  > /tmp/vector_tap_sources
ctrl+c

sh-5.1# grep "Hello world, troubleshooting the collector with tap"  /tmp/vector_tap_sources | head -1
{"file":"/var/log/pods/taptest_hello-node-595bfd9b77-kxdjr_132094be-fd93-4c12-9087-94508eb9a4a6/agnhost/0.log","hostname":"worker1.example.com","kubernetes":{"annotations":{"k8s.ovn.org/pod-networks":"{\"default\":{\"ip_addresses\":[\"10.x.x.x/x\"],\"mac_address\":\"0a:58:0a:83:03:33\",\"gateway_ips\":[\"10.x.x.x\"],\"routes\":[{\"dest\":\"10.x.x.x/14\",\"nextHop\":\"10.x.x.x\"},{\"dest\":\"172.x.x.x/16\",\"nextHop\":\"10.x.x.x\"},{\"dest\":\"100.x.x.x/16\",\"nextHop\":\"10.x.x.x\"}],\"ip_address\":\"10.x.x.x/23\",\"gateway_ip\":\"10.x.x.x\"}}","k8s.v1.cni.cncf.io/network-status":"[{\n    \"name\": \"ovn-kubernetes\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.x.x.x\"\n    ],\n    \"mac\": \"0a:58:0a:83:03:33\",\n    \"default\": true,\n    \"dns\": {}\n}]","openshift.io/scc":"restricted-v2","seccomp.security.alpha.kubernetes.io/pod":"runtime/default"},"container_id":"cri-o://f09f800cb873238bbad3e3709f6276bb4d87c5e7d101b4e9c8904ad2ef13d982","container_image":"registry.k8s.io/e2e-test-images/agnhost:2.43","container_image_id":"registry.k8s.io/e2e-test-images/agnhost@sha256:16bbf38c463a4223d8cfe4da12bc61010b082a79b4bb003e2d3ba3ece5dd5f9e","container_name":"agnhost","labels":{"app":"hello-node","pod-template-hash":"595bfd9b77"},"namespace_id":"f15d74c2-c5bc-4bf0-ad44-ae9d8d3b104c","namespace_labels":{"kubernetes.io/metadata.name":"taptest","pod-security.kubernetes.io/audit":"restricted","pod-security.kubernetes.io/audit-version":"v1.24","pod-security.kubernetes.io/warn":"restricted","pod-security.kubernetes.io/warn-version":"v1.24"},"namespace_name":"taptest","node_labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/instance-type":"vsphere-vm.cpu-4.mem-16gb.os-unknown","beta.kubernetes.io/os":"linux","failure-domain.beta.kubernetes.io/region":"redhat-region","failure-domain.beta.kubernetes.io/zone":"redhat-zone-a","ingresscontroller":"default","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"worker1.example.com","kubernetes.io/os":"linux","node-role.kubernetes.io/worker":"","node.kubernetes.io/instance-type":"vsphere-vm.cpu-4.mem-16gb.os-unknown","node.openshift.io/os_id":"rhcos","topology.csi.vmware.com/openshift-region":"redhat-region","topology.csi.vmware.com/openshift-zone":"redhat-zone-a","topology.kubernetes.io/region":"redhat-region","topology.kubernetes.io/zone":"redhat-zone-a"},"pod_id":"132094be-fd93-4c12-9087-94508eb9a4a6","pod_ip":"10.x.x.x","pod_ips":["10.x.x.x"],"pod_name":"hello-node-595bfd9b77-kxdjr","pod_owner":"ReplicaSet/hello-node-595bfd9b77"},"message":"Hello world, troubleshooting the collector with tap","source_type":"kubernetes_logs","stream":"stdout","timestamp":"2024-10-03T15:21:24.286079141Z"}

Let's verify if the Vector sink is receiving the events for being sent examining the inputs of output_default_loki_apps:

sh-5.1# vector tap --inputs-of output_default_loki_apps  > /tmp/vector_tap_sinks
ctrl+c

sh-5.1#  grep "Hello world, troubleshooting the collector with tap"  /tmp/vector_tap_sinks | head -1
{"file":"/var/log/pods/taptest_hello-node-595bfd9b77-kxdjr_132094be-fd93-4c12-9087-94508eb9a4a6/agnhost/0.log","hostname":"worker1.example.com","kubernetes":{"container_name":"","namespace_name":"","pod_name":""},"level":"default","log_type":"application","message":"Hello world, troubleshooting the collector with tap","openshift":{"cluster_id":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx","sequence":1727969368254938317}}

For getting the files outside of the pod for a posterior analysis use the command:

$ oc -n <namespace> exec <collector pod> -- cat /path/to/file > /path/to/destination

For the example here where created the files /tmp/vector_tap_sinks and /tmp/vector_tap_sources:

$ oc -n <namespace> exec <collector pod> -- cat /tmp/vector_tap_sinks > /tmp/vector_tap_sinks
$ oc -n <namespace> exec <collector pod> -- cat /tmp/vector_tap_sources > /tmp/vector_tap_sources

How to obtain the Vector component ID

Let's explain three different ways:

  1. As it was indicated in this article before, it can be obtained with the command vector top. Read the section vector top for more details.
  2. From a Vector pod running
  3. From the Vector secret

Obtaining Vector component ID from a Vector pod running

$ oc -n <namespace> rsh <collector pod>

// Get the Sources
$ grep "\[sources" /etc/vector/vector.toml 

// Get the Transforms
$ grep -i "\[transforms" /etc/vector/vector.toml 

// Get the Sinks
$ grep -i "\[sinks" /etc/vector/vector.toml 

Obtaining Vector component ID from the Vector secret

Let's explain how to obtain them from the Vector secret. The Vector configuration is stored in a secret in the same namespace that the Vector pods are running.

This secret follows the pattern name <CR name>-config. If the CR is called collector or instance, the secret containing the Vector running configuration will be collector-config.

Get the CR name:

/// In Logging 5
$ oc get clusterlogging -n <namespace>

// In Logging 6
$ oc get obsclf -n <namespace> 

Get the vector configuration from the secret where <CR name> is the name obtained from the previous command:

Warning: the vector.toml can contain private data. Be sure that the vector.toml file saved is only readable by the user running the commands

$ oc get secret <CR name>-config -n <namespace> -o jsonpath='{.data.vector\.toml}' |base64 -d > vector.toml
or 
$ oc extract secret/<CR name>-config --keys="vector.toml" -n <namespace>

Let's get the component ID:

// Get the Sources
$ grep "\[sources" vector.toml 

// Get the Transforms
$ grep -i "\[transforms" vector.toml

// Get the Sinks
$ grep -i "\[sinks" vector.toml

Other links of interest

Content from vector.dev is not included.Introducing vector top
Content from vector.dev is not included.Introducing vector tap
Content from vector.dev is not included.Using vector tap

Category
Components
Article Type