Using must-gather for DG 8 Operator troubleshooting

Solution Verified - Updated

Environment

  • Red hat OpenShift Container Platform (OCP)
    • 4.x
  • Red Hat Data Grid (RHDG)
    • 8.x

Issue

How to use OCP 4's must-gather in a namespace for DG 8 Operator troubleshooting?

Resolution

Collecting

$ oc adm must-gather

^ There is no get above.

Explanation

A must-gather is a bundle of dump data from each namespace with openshift as a prefix, and therefore will have information for subscription, clusterserviceversions, installplans, operatorgroups. This can be useful in certain scenarios for troubleshooting upgrades, for instance.
Must-gather is rarely used besides these scenarios, except for OSSM integration for example.

In case your must-gather brings custom namespaces, i.e. beyond the openshift-* namespaces - see below:

Must gather with custom namespaces:

This type of must-gather can be used to investigate operators (any operator) upgrade process, which takes into consideration the install plan, as detailed in How to upgrade from DG 8.1/8.2 to DG 8.3 in DG 8 Operator.
The must gather won't have the controllers' pod, nor the application pod, nor custom resources. But the subscription, install plan, csv (clusterserviceversions), can be used to verify upgrades, current version installed specifically.

Example:

$ ls must-gather.local./quay-io-openshift-id/namespaces/an-namespace/operators.coreos.com
clusterserviceversions  installplans  operatorconditions  operatorgroups  subscriptions

Must gather without out custom namespaces - only default ones:

What is inside the default must-gather:

$ cd must-gather
$ ls
cluster-scoped-resources  event-filter.html  namespaces  timestamp
$ ls
dg-test-84  <--- (interested ns)                              openshift-cluster-machine-approver      openshift-controller-manager-operator  openshift-kube-apiserver-operator                 openshift-multus
kube-system                                openshift-cluster-node-tuning-operator  openshift-dns                          openshift-kube-controller-manager                 openshift-network-diagnostics
openshift                                  
... <--- several other openshift ns
ObjectUsageDefinition
cluster service versionsto see the current csv versionOLM uses an api called ClusterServiceVersion (CSV) to describe a single instance of a version of an operator.
install plansUsed for upgradesAn InstallPlan is CustomResourceDefinition is a CRD that sets the path for upgrading of the current operator to the next
operator conditionsUsed for communication between operator and OLMAn OperatorCondition is CRD that creates a communication between OLM and an operator it manages.
operator groupsused for Operator namespace targetingThe OG will be associated with the Subscription on the same namespace
subscriptionsused for operator installation (OLM's CR for operator installation)Kind Subscription (OLM's API)
operator yamlused for listing the operator on the clusterKind Operator

Verifying if upgrade succeeded:

See csv file inside must-gather:

$ cat must-gather.local.id/quay-id/namespaces/namespace/operators.coreos.com/clusterserviceversions/datagrid-operator.v8.3.8.yaml | grep phase
        f:phase: {}
      - description: Current phase of the backup operation
        path: phase
      - description: Current phase of the batch operation
        path: phase
      - description: Current phase of the restore operation
        path: phase
    phase: Pending
    phase: Pending
    phase: InstallReady
....
  phase: Succeeded <---------------

Root Cause

Inspect != must-gather
Inspect nor the must gather won't have the application detail logs/yamls objects.
The inspect will bring the pods logs, pod yamls, configmaps, and services/routes information.
Must-gather brings several information about operators installed, like csv, subscription.

Also note that oc inspect command does not retrieve the infinispan objects, only the yamls and logs in the namespace. For details about inspect see Using inspect for DG 8 troubleshooting.

ObjectPurpose
Infinispan CRRelevant for Infinispan Cluster, Cache, Batch, Backup/Restore CR definitions
InspectRelevant for user namespace operator pod issues, infinispan pod issues, memory allocation issue
Must-gatherRelevant for openshift namespaces, and it will bring Operator installation (can be replaced by oc get csv, oc get subscription)

Diagnostic Steps

  1. For inspecting sub, og, csv, ip: oc adm must-gather
  2. For pod logs, pod yaml, service, controller pod, get an inspect: oc adm inspect ns/
  3. From the cluster-scoped-resources:
$ grep -ri "datagrid-operator.v8.4.2"
cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/batches.infinispan.org.yaml:    operatorframework.io/installed-alongside-965c1f8e0971398: dg-test-84/datagrid-operator.v8.4.2
cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/backups.infinispan.org.yaml:    operatorframework.io/installed-alongside-965c1f8e0971398: dg-test-84/datagrid-operator.v8.4.2
cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/caches.infinispan.org.yaml:    operatorframework.io/installed-alongside-965c1f8e0971398: dg-test-84/datagrid-operator.v8.4.2
cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/restores.infinispan.org.yaml:    operatorframework.io/installed-alongside-965c1f8e0971398: dg-test-84/datagrid-operator.v8.4.2
cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/infinispans.infinispan.org.yaml:    operatorframework.io/installed-alongside-965c1f8e0971398: dg-test-84/datagrid-operator.v8.4.2

...

spec: ...   scope: Namespaced
Product(s)
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.