Using inspect for DG 8 troubleshooting

Solution Verified - Updated

Environment

  • Red hat OpenShift Container Platform (OCP)
    • 4.x
  • Red Hat Data Grid (RHDG)
    • 8.x

Issue

How to use the inspect in a namespace to help on the DG 8 troubleshooting?

Resolution

Inspect is only available on OCP 4

To get the inspect data for a specific namespace run:

    oc adm inspect ns/<namespace>

There is no project or get on the command above. Full example:

oc project
Using project "dg-test-nyc" on server "https://api.ci-ln-origin-ci-int.example.com:6443".
$ oc adm inspect ns/dg-test-nyc
Gathering data for ns/dg-test-nyc...
Wrote inspect data to inspect.local.2129870489842186664 <---
$ ls
inspect.local.2129870489842186664 <---

The operator will contain the yamls for that specific namespace, resultset.yaml

Example output:

.
├── inspect-logs
│   └── inspect.local.onelocal
│       ├── event-filter.html
│      ├── namespaces
│      │   └── rhdg
│      │       ├── appconnect.com
│      │       │   ├── configurations.yaml
│      │       │   ├── dashboards.yaml
│      │       │   ├── designerauthorings.yaml
│      │       │   ├── integrationflows.yaml
│      │       │   ├── integrationservers.yaml
│      │       │   ├── switchservers.yaml
│      │       │   └── traces.yaml
│      │       ├── apps
│      │       │   ├── daemonsets.yaml
│      │       │   ├── deployments.yaml
│      │       │   ├── replicasets.yaml
│      │       │   └── statefulsets.yaml
│      │       ├── apps.openshift.io
│      │       │   └── deploymentconfigs.yaml
│      │       ├── autoscaling
│      │       │   └── horizontalpodautoscalers.yaml
│      │       ├── batch
│      │       │   ├── cronjobs.yaml
│      │       │   └── jobs.yaml
│      │       ├── build.openshift.io
│      │       │   ├── buildconfigs.yaml
│      │       │   └── builds.yaml
│      │       ├── core
│      │       │   ├── configmaps.yaml <--------------------- config maps
│      │       │   ├── endpoints.yaml
│      │       │   ├── events.yaml
│      │       │   ├── persistentvolumeclaims.yaml
│      │       │   ├── pods.yaml
│      │       │   ├── replicationcontrollers.yaml
│      │       │   ├── secrets.yaml       <---------------------- secrets
│      │       │   └── services.yaml      <--------------------- services
│      │       ├── image.openshift.io
│      │       │   └── imagestreams.yaml
│      │       ├── pods
│      │       │   ├── grafana-deployment
│      │       │   │   ├── grafana
│      │       │   │   │   └── grafana
│      │       │   │   │       └── logs
│      │       │   │   │           ├── current.log
│      │       │   │   │           ├── previous.insecure.log
│      │       │   │   │           └── previous.log
│      │       │   │   ├── grafana-deployment.yaml
│      │       │   │   └── grafana-plugins-init
│      │       │   │       └── grafana-plugins-init
│      │       │   │           └── logs
│      │       │   │               ├── current.log
│      │       │   │               ├── previous.insecure.log
│      │       │   │               └── previous.log
│      │       │   ├── infinispan-operator-controller-manager             <------------operator logs
│      │       │   │   ├── infinispan-operator-controller-manager.yaml
│      │       │   │   └── manager
│      │       │   │       └── manager
│      │       │   │           └── logs
│      │       │   │               ├── current.log
│      │       │   │               └── previous.log
│      │       │   ├── prometheus-prometheus <------------------------- prometheus
│      │       │   │   ├── config-reloader
│      │       │   │   │   └── config-reloader
│      │       │   │   │       └── logs
│      │       │   │   │           ├── current.log
│      │       │   │   │           ├── previous.insecure.log
│      │       │   │   │           └── previous.log
│      │       │   │   ├── prometheus
│      │       │   │   │   └── prometheus
│      │       │   │   │       └── logs
│      │       │   │   │           ├── current.log
│      │       │   │   │           └── previous.log
│      │       │   │   └── prometheus-prometheus.yaml
│      │       │   └── rhdg-cluster
│      │       │       ├── infinispan
│      │       │       │   └── infinispan
│      │       │       │       └── logs
│      │       │       │           ├── current.log <------------------------- pod logs
│      │       │       │           ├── previous.insecure.log
│      │       │       │           └── previous.log
│      │       │       └── rhdg-cluster.yaml
│      │       ├── policy
│      │       │   └── poddisruptionbudgets.yaml
│      │       ├── rhdg.yaml
│      │       └── route.openshift.io
│      │           └── routes.yaml                  <------------------------- routes yaml
│      └── timestamp
├── inspect.logs
└── inspect-logs.zip

See table below for paths vs objects:

PathObjects
inspect.local.number/namespaces/$namespace/coreconfigmap, endpoints, pods, secrets, pvc, services, events
inspect.local.number/namespaces/$namespace/podspods logs - including controller logs
inspect.local.number/namespaces/$namespace/image.openshift.ioimagestreams details
inspect.local.number/namespaces/$namespace/routesroutes

Root Cause

Neither Inspect nor the Must Gather will have the application detail logs/yamls objects, instead it will bring the pods logs, pod yamls, configmaps, deployment/dc, and services/routes information. - from that specific namespace.
For must-gather see solution Using must-gather for DG 8 Operator troubleshooting

Retrieving CRs:

Also note that oc inspect command does not retrieve the infinispan objects, only the yamls and logs in the namespace. For DG CR files, see the commands below:

CommandOutput
oc get infinispan -o yamlReturns all of the Infinispan Custom Resources (CR) in that namespace
oc get infinispan -o yaml --all-namespacesReturns all of the Infinispan Custom Resources (CR) in all namespaces
oc get cache -o yamlReturns all of the Cache Custom Resources (CR) in this namespace
oc get cache -o yaml --all-namespacesReturns all of the Cache Custom Resources (CR) in this namespace in all namespaces
oc get subReturns the subscription object (related to operator definitions)
oc get csvReturns the csv object (related to operator definitions)
oc get svcReturns list of services (related to operator operation)

Example oc get infinispan -o yaml output:

$ oc get infinispan -o yaml --all-namespaces
apiVersion: v1
items:
- apiVersion: infinispan.org/v1
  kind: Infinispan
  metadata:
    annotations:
      infinispan.org/monitoring: "false"
      infinispan.org/operatorPodTargetLabels: com.redhat.component-name,com.redhat.component-type,com.redhat.component-version,com.redhat.product-name,com.redhat.product-version
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"infinispan.org/v1","kind":"Infinispan","metadata":{"annotations":{"infinispan.org/monitoring":"false"},"labels":{"prometheus_domain":"dg-cluster","type":"middleware"},"name":"dg-cluster","namespace":"eap-test"},"spec":{"configMapName":"dg-cluster-custom-config","container":{"cpu":"2","extraJvmOpts":"-Xlog:gc*=info:file=/tmp/gc.log:time,level,tags,uptimemillis:filecount=10,filesize=1m -Dcom.redhat.fips=false","memory":"3Gi"},"endpointSecretName":"dg-cluster-credentials","expose":{"annotations":{},"type":"Route"},"logging":{"categories":{}},"replicas":1,"security":{"endpointAuthentication":false,"endpointEncryption":{"type":"None"}},"service":{"container":{"ephemeralStorage":true},"type":"DataGrid"}}}
    creationTimestamp: 2022-06-08T15:48:30Z
    generation: 2
    labels:
      com.redhat.component-name: Data_Grid
      com.redhat.component-type: application
      com.redhat.component-version: 8.2.3
      com.redhat.product-name: Red_Hat_Runtimes
      com.redhat.product-version: 2022-Q1
      prometheus_domain: dg-cluster
...

Example empty $ oc get cache -o yaml:

$ oc get cache -o yaml
apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

OCP Events (present on the inspect):

OCP events are also relevant in some instances (pvc allocation, dg operator election, OOM-killer) - besides the custom resources (CRs) - events of the namespace should be in the namespaces/$namespace/core/events.yaml:

$ oc get events
LAST SEEN   TYPE      REASON                         OBJECT                                                         MESSAGE
49m         Normal    LeaderElection                 configmap/632512e4.infinispan.org                              infinispan-operator-controller-manager-86ccc8d4d4-rz59l_359f0bfa-2448-4bb4-9c5a-9a85e0c0db3d became leader
...
44m         Normal    ProvisioningSucceeded          persistentvolumeclaim/data-volume-dg-cluster-route-0           Successfully provisioned volume pvc-a2154ff0-d3b6-48f9-9e5c-2941e5983073 using kubernetes.io/gce-pd
49m         Normal    RequirementsUnknown            clusterserviceversion/datagrid-operator.v8.3.3                 requirements not yet checked

Diagnostic Steps

Difference between CRD vs CRs
CRD and CRs are different objects:

ObjectCommandWhat it is
CRsTo get CRs: (i.e. infinispan crs) do oc get infinispan -o yaml (as described above)CR is the instance (the object in OO terms)
CRDoc describe crd infinispans, see details belowCRD is the type definition (the class in OO terms)

While CRDs are cluster wide definitions, the CR are objects that belong to a namespace. The operator, when on installation, creates the CRDs in the cluster automatically. However, CRDs do not really belong to the operator. And in fact, if the user uninstalls the operator, CRDs are not deleted, consequently CRDs can be used to verify if the operator was ever installed.

Using oc explain to get spec fields:
It can be useful to use explain command, see the solution How to navigate inside a Custom Resource to know its spec in OCP 4?, example:

###oc explain infinispan.spec --recursive

list all CRDs in infinispan namely: infinispan, cache, batch, backup, restore:

### list all CRDs in infinispan namely: infinispan, cache, batch, backup, restore
$ oc api-resources -o wide | grep infinispan
backups                                                   infinispan.org                        true         Backup                              
batches                                                   infinispan.org                        true         Batch                                
caches                                                    infinispan.org                        true         Cache                                
infinispans                                               infinispan.org                        true         Infinispan                        
restores                                                  infinispan.org                        true         Restore                             

oc describe crd infinispans

$ oc describe crd infinispans
Name:         infinispans.infinispan.org
Namespace:    
Labels:       app.kubernetes.io/name=infinispan-operator
              operators.coreos.com/datagrid.dg-test-lon=
              operators.coreos.com/datagrid.dg-test-nyc=
Annotations:  controller-gen.kubebuilder.io/version=v0.4.1
              operatorframework.io/installed-alongside-3cfb355d30d69aaa=dg-test-lon/datagrid-operator.v8.3.6
              operatorframework.io/installed-alongside-582fb7714a56795b=dg-test-nyc/datagrid-operator.v8.3.6
API Version:  apiextensions.k8s.io/v1
Kind:         CustomResourceDefinition
Metadata:
  Creation Timestamp:  2022-07-13T16:51:16Z
....

Namespace-bounded elements such as Netpolicies and Limit Ranges/Quotas, will be in the inspect.

  • The Netpolice impacts communication with other clusters for example.
  • The Limit ranges impact Batch CRs, given the pod created by its job. Before JDG-7031, the batch pod didn't have the limits and requests set, so unless the user creates a limit range yaml the pod created won't have limits/requests so it won't get spawned. Also, the quota you have will prevent the pod creation in case the limit range you have set is above the permitted by the quota.

About Alpine images

Note: Be aware that Alpine is not supported as container OS, that's an important verification to be done. So replicate this using a RHEL base image or UBI image for support.
In the dockerfile one should have:

FROM: ubi-8

vs

FROM artifactory.repository/alpine:3.12.0
Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.