Using inspect for DG 8 troubleshooting
Environment
- Red hat OpenShift Container Platform (OCP)
- 4.x
- Red Hat Data Grid (RHDG)
- 8.x
Issue
How to use the inspect in a namespace to help on the DG 8 troubleshooting?
Resolution
Inspect is only available on OCP 4
To get the inspect data for a specific namespace run:
oc adm inspect ns/<namespace>
There is no project or get on the command above. Full example:
oc project
Using project "dg-test-nyc" on server "https://api.ci-ln-origin-ci-int.example.com:6443".
$ oc adm inspect ns/dg-test-nyc
Gathering data for ns/dg-test-nyc...
Wrote inspect data to inspect.local.2129870489842186664 <---
$ ls
inspect.local.2129870489842186664 <---
The operator will contain the yamls for that specific namespace, resultset.yaml
Example output:
.
├── inspect-logs
│ └── inspect.local.onelocal
│ ├── event-filter.html
│ ├── namespaces
│ │ └── rhdg
│ │ ├── appconnect.com
│ │ │ ├── configurations.yaml
│ │ │ ├── dashboards.yaml
│ │ │ ├── designerauthorings.yaml
│ │ │ ├── integrationflows.yaml
│ │ │ ├── integrationservers.yaml
│ │ │ ├── switchservers.yaml
│ │ │ └── traces.yaml
│ │ ├── apps
│ │ │ ├── daemonsets.yaml
│ │ │ ├── deployments.yaml
│ │ │ ├── replicasets.yaml
│ │ │ └── statefulsets.yaml
│ │ ├── apps.openshift.io
│ │ │ └── deploymentconfigs.yaml
│ │ ├── autoscaling
│ │ │ └── horizontalpodautoscalers.yaml
│ │ ├── batch
│ │ │ ├── cronjobs.yaml
│ │ │ └── jobs.yaml
│ │ ├── build.openshift.io
│ │ │ ├── buildconfigs.yaml
│ │ │ └── builds.yaml
│ │ ├── core
│ │ │ ├── configmaps.yaml <--------------------- config maps
│ │ │ ├── endpoints.yaml
│ │ │ ├── events.yaml
│ │ │ ├── persistentvolumeclaims.yaml
│ │ │ ├── pods.yaml
│ │ │ ├── replicationcontrollers.yaml
│ │ │ ├── secrets.yaml <---------------------- secrets
│ │ │ └── services.yaml <--------------------- services
│ │ ├── image.openshift.io
│ │ │ └── imagestreams.yaml
│ │ ├── pods
│ │ │ ├── grafana-deployment
│ │ │ │ ├── grafana
│ │ │ │ │ └── grafana
│ │ │ │ │ └── logs
│ │ │ │ │ ├── current.log
│ │ │ │ │ ├── previous.insecure.log
│ │ │ │ │ └── previous.log
│ │ │ │ ├── grafana-deployment.yaml
│ │ │ │ └── grafana-plugins-init
│ │ │ │ └── grafana-plugins-init
│ │ │ │ └── logs
│ │ │ │ ├── current.log
│ │ │ │ ├── previous.insecure.log
│ │ │ │ └── previous.log
│ │ │ ├── infinispan-operator-controller-manager <------------operator logs
│ │ │ │ ├── infinispan-operator-controller-manager.yaml
│ │ │ │ └── manager
│ │ │ │ └── manager
│ │ │ │ └── logs
│ │ │ │ ├── current.log
│ │ │ │ └── previous.log
│ │ │ ├── prometheus-prometheus <------------------------- prometheus
│ │ │ │ ├── config-reloader
│ │ │ │ │ └── config-reloader
│ │ │ │ │ └── logs
│ │ │ │ │ ├── current.log
│ │ │ │ │ ├── previous.insecure.log
│ │ │ │ │ └── previous.log
│ │ │ │ ├── prometheus
│ │ │ │ │ └── prometheus
│ │ │ │ │ └── logs
│ │ │ │ │ ├── current.log
│ │ │ │ │ └── previous.log
│ │ │ │ └── prometheus-prometheus.yaml
│ │ │ └── rhdg-cluster
│ │ │ ├── infinispan
│ │ │ │ └── infinispan
│ │ │ │ └── logs
│ │ │ │ ├── current.log <------------------------- pod logs
│ │ │ │ ├── previous.insecure.log
│ │ │ │ └── previous.log
│ │ │ └── rhdg-cluster.yaml
│ │ ├── policy
│ │ │ └── poddisruptionbudgets.yaml
│ │ ├── rhdg.yaml
│ │ └── route.openshift.io
│ │ └── routes.yaml <------------------------- routes yaml
│ └── timestamp
├── inspect.logs
└── inspect-logs.zip
See table below for paths vs objects:
| Path | Objects |
|---|---|
inspect.local.number/namespaces/$namespace/core | configmap, endpoints, pods, secrets, pvc, services, events |
inspect.local.number/namespaces/$namespace/pods | pods logs - including controller logs |
inspect.local.number/namespaces/$namespace/image.openshift.io | imagestreams details |
inspect.local.number/namespaces/$namespace/routes | routes |
Root Cause
Neither Inspect nor the Must Gather will have the application detail logs/yamls objects, instead it will bring the pods logs, pod yamls, configmaps, deployment/dc, and services/routes information. - from that specific namespace.
For must-gather see solution Using must-gather for DG 8 Operator troubleshooting
Retrieving CRs:
Also note that oc inspect command does not retrieve the infinispan objects, only the yamls and logs in the namespace. For DG CR files, see the commands below:
| Command | Output |
|---|---|
| oc get infinispan -o yaml | Returns all of the Infinispan Custom Resources (CR) in that namespace |
| oc get infinispan -o yaml --all-namespaces | Returns all of the Infinispan Custom Resources (CR) in all namespaces |
| oc get cache -o yaml | Returns all of the Cache Custom Resources (CR) in this namespace |
| oc get cache -o yaml --all-namespaces | Returns all of the Cache Custom Resources (CR) in this namespace in all namespaces |
| oc get sub | Returns the subscription object (related to operator definitions) |
| oc get csv | Returns the csv object (related to operator definitions) |
| oc get svc | Returns list of services (related to operator operation) |
Example oc get infinispan -o yaml output:
$ oc get infinispan -o yaml --all-namespaces
apiVersion: v1
items:
- apiVersion: infinispan.org/v1
kind: Infinispan
metadata:
annotations:
infinispan.org/monitoring: "false"
infinispan.org/operatorPodTargetLabels: com.redhat.component-name,com.redhat.component-type,com.redhat.component-version,com.redhat.product-name,com.redhat.product-version
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"infinispan.org/v1","kind":"Infinispan","metadata":{"annotations":{"infinispan.org/monitoring":"false"},"labels":{"prometheus_domain":"dg-cluster","type":"middleware"},"name":"dg-cluster","namespace":"eap-test"},"spec":{"configMapName":"dg-cluster-custom-config","container":{"cpu":"2","extraJvmOpts":"-Xlog:gc*=info:file=/tmp/gc.log:time,level,tags,uptimemillis:filecount=10,filesize=1m -Dcom.redhat.fips=false","memory":"3Gi"},"endpointSecretName":"dg-cluster-credentials","expose":{"annotations":{},"type":"Route"},"logging":{"categories":{}},"replicas":1,"security":{"endpointAuthentication":false,"endpointEncryption":{"type":"None"}},"service":{"container":{"ephemeralStorage":true},"type":"DataGrid"}}}
creationTimestamp: 2022-06-08T15:48:30Z
generation: 2
labels:
com.redhat.component-name: Data_Grid
com.redhat.component-type: application
com.redhat.component-version: 8.2.3
com.redhat.product-name: Red_Hat_Runtimes
com.redhat.product-version: 2022-Q1
prometheus_domain: dg-cluster
...
Example empty $ oc get cache -o yaml:
$ oc get cache -o yaml
apiVersion: v1
items: []
kind: List
metadata:
resourceVersion: ""
selfLink: ""
OCP Events (present on the inspect):
OCP events are also relevant in some instances (pvc allocation, dg operator election, OOM-killer) - besides the custom resources (CRs) - events of the namespace should be in the namespaces/$namespace/core/events.yaml:
$ oc get events
LAST SEEN TYPE REASON OBJECT MESSAGE
49m Normal LeaderElection configmap/632512e4.infinispan.org infinispan-operator-controller-manager-86ccc8d4d4-rz59l_359f0bfa-2448-4bb4-9c5a-9a85e0c0db3d became leader
...
44m Normal ProvisioningSucceeded persistentvolumeclaim/data-volume-dg-cluster-route-0 Successfully provisioned volume pvc-a2154ff0-d3b6-48f9-9e5c-2941e5983073 using kubernetes.io/gce-pd
49m Normal RequirementsUnknown clusterserviceversion/datagrid-operator.v8.3.3 requirements not yet checked
Diagnostic Steps
Difference between CRD vs CRs
CRD and CRs are different objects:
| Object | Command | What it is |
|---|---|---|
| CRs | To get CRs: (i.e. infinispan crs) do oc get infinispan -o yaml (as described above) | CR is the instance (the object in OO terms) |
| CRD | oc describe crd infinispans, see details below | CRD is the type definition (the class in OO terms) |
While CRDs are cluster wide definitions, the CR are objects that belong to a namespace. The operator, when on installation, creates the CRDs in the cluster automatically. However, CRDs do not really belong to the operator. And in fact, if the user uninstalls the operator, CRDs are not deleted, consequently CRDs can be used to verify if the operator was ever installed.
Using oc explain to get spec fields:
It can be useful to use explain command, see the solution How to navigate inside a Custom Resource to know its spec in OCP 4?, example:
###oc explain infinispan.spec --recursive
list all CRDs in infinispan namely: infinispan, cache, batch, backup, restore:
### list all CRDs in infinispan namely: infinispan, cache, batch, backup, restore
$ oc api-resources -o wide | grep infinispan
backups infinispan.org true Backup
batches infinispan.org true Batch
caches infinispan.org true Cache
infinispans infinispan.org true Infinispan
restores infinispan.org true Restore
oc describe crd infinispans
$ oc describe crd infinispans
Name: infinispans.infinispan.org
Namespace:
Labels: app.kubernetes.io/name=infinispan-operator
operators.coreos.com/datagrid.dg-test-lon=
operators.coreos.com/datagrid.dg-test-nyc=
Annotations: controller-gen.kubebuilder.io/version=v0.4.1
operatorframework.io/installed-alongside-3cfb355d30d69aaa=dg-test-lon/datagrid-operator.v8.3.6
operatorframework.io/installed-alongside-582fb7714a56795b=dg-test-nyc/datagrid-operator.v8.3.6
API Version: apiextensions.k8s.io/v1
Kind: CustomResourceDefinition
Metadata:
Creation Timestamp: 2022-07-13T16:51:16Z
....
Namespace-bounded elements such as Netpolicies and Limit Ranges/Quotas, will be in the inspect.
- The Netpolice impacts communication with other clusters for example.
- The Limit ranges impact Batch CRs, given the pod created by its job. Before JDG-7031, the batch pod didn't have the limits and requests set, so unless the user creates a limit range yaml the pod created won't have limits/requests so it won't get spawned. Also, the quota you have will prevent the pod creation in case the limit range you have set is above the permitted by the quota.
About Alpine images
Note: Be aware that Alpine is not supported as container OS, that's an important verification to be done. So replicate this using a RHEL base image or UBI image for support.
In the dockerfile one should have:
FROM: ubi-8
vs
FROM artifactory.repository/alpine:3.12.0
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.