Using must-gather for DG 8 Operator troubleshooting
Environment
- Red hat OpenShift Container Platform (OCP)
- 4.x
- Red Hat Data Grid (RHDG)
- 8.x
Issue
How to use OCP 4's must-gather in a namespace for DG 8 Operator troubleshooting?
Resolution
Collecting
$ oc adm must-gather
^ There is no get above.
Explanation
A must-gather is a bundle of dump data from each namespace with openshift as a prefix, and therefore will have information for subscription, clusterserviceversions, installplans, operatorgroups. This can be useful in certain scenarios for troubleshooting upgrades, for instance.
Must-gather is rarely used besides these scenarios, except for OSSM integration for example.
In case your must-gather brings custom namespaces, i.e. beyond the openshift-* namespaces - see below:
Must gather with custom namespaces:
This type of must-gather can be used to investigate operators (any operator) upgrade process, which takes into consideration the install plan, as detailed in How to upgrade from DG 8.1/8.2 to DG 8.3 in DG 8 Operator.
The must gather won't have the controllers' pod, nor the application pod, nor custom resources. But the subscription, install plan, csv (clusterserviceversions), can be used to verify upgrades, current version installed specifically.
Example:
$ ls must-gather.local./quay-io-openshift-id/namespaces/an-namespace/operators.coreos.com
clusterserviceversions installplans operatorconditions operatorgroups subscriptions
Must gather without out custom namespaces - only default ones:
What is inside the default must-gather:
$ cd must-gather
$ ls
cluster-scoped-resources event-filter.html namespaces timestamp
$ ls
dg-test-84 <--- (interested ns) openshift-cluster-machine-approver openshift-controller-manager-operator openshift-kube-apiserver-operator openshift-multus
kube-system openshift-cluster-node-tuning-operator openshift-dns openshift-kube-controller-manager openshift-network-diagnostics
openshift
... <--- several other openshift ns
| Object | Usage | Definition |
|---|---|---|
| cluster service versions | to see the current csv version | OLM uses an api called ClusterServiceVersion (CSV) to describe a single instance of a version of an operator. |
| install plans | Used for upgrades | An InstallPlan is CustomResourceDefinition is a CRD that sets the path for upgrading of the current operator to the next |
| operator conditions | Used for communication between operator and OLM | An OperatorCondition is CRD that creates a communication between OLM and an operator it manages. |
| operator groups | used for Operator namespace targeting | The OG will be associated with the Subscription on the same namespace |
| subscriptions | used for operator installation (OLM's CR for operator installation) | Kind Subscription (OLM's API) |
| operator yaml | used for listing the operator on the cluster | Kind Operator |
Verifying if upgrade succeeded:
See csv file inside must-gather:
$ cat must-gather.local.id/quay-id/namespaces/namespace/operators.coreos.com/clusterserviceversions/datagrid-operator.v8.3.8.yaml | grep phase
f:phase: {}
- description: Current phase of the backup operation
path: phase
- description: Current phase of the batch operation
path: phase
- description: Current phase of the restore operation
path: phase
phase: Pending
phase: Pending
phase: InstallReady
....
phase: Succeeded <---------------
Root Cause
Inspect != must-gather
Inspect nor the must gather won't have the application detail logs/yamls objects.
The inspect will bring the pods logs, pod yamls, configmaps, and services/routes information.
Must-gather brings several information about operators installed, like csv, subscription.
Also note that oc inspect command does not retrieve the infinispan objects, only the yamls and logs in the namespace. For details about inspect see Using inspect for DG 8 troubleshooting.
| Object | Purpose |
|---|---|
| Infinispan CR | Relevant for Infinispan Cluster, Cache, Batch, Backup/Restore CR definitions |
| Inspect | Relevant for user namespace operator pod issues, infinispan pod issues, memory allocation issue |
| Must-gather | Relevant for openshift namespaces, and it will bring Operator installation (can be replaced by oc get csv, oc get subscription) |
Diagnostic Steps
- For inspecting sub, og, csv, ip: oc adm must-gather
- For pod logs, pod yaml, service, controller pod, get an inspect: oc adm inspect ns/
- From the cluster-scoped-resources:
$ grep -ri "datagrid-operator.v8.4.2"
cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/batches.infinispan.org.yaml: operatorframework.io/installed-alongside-965c1f8e0971398: dg-test-84/datagrid-operator.v8.4.2
cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/backups.infinispan.org.yaml: operatorframework.io/installed-alongside-965c1f8e0971398: dg-test-84/datagrid-operator.v8.4.2
cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/caches.infinispan.org.yaml: operatorframework.io/installed-alongside-965c1f8e0971398: dg-test-84/datagrid-operator.v8.4.2
cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/restores.infinispan.org.yaml: operatorframework.io/installed-alongside-965c1f8e0971398: dg-test-84/datagrid-operator.v8.4.2
cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/infinispans.infinispan.org.yaml: operatorframework.io/installed-alongside-965c1f8e0971398: dg-test-84/datagrid-operator.v8.4.2
...
spec: ... scope: Namespaced
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.