How to create and gather sos reports in bulk for an OpenShift 4 cluster?

Solution Unverified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Red Hat Enterprise Linux (RHEL) (as jumpserver)
    • 8.4+
    • 9
  • sos
    • 4

Issue

  • Is it possible to create and gather sos reports in bulk for an OCP 4 cluster?
  • How to collect sos report from several OpenShift 4 nodes at the same time?

Resolution

With the new sos 4 tool, which is provided starting with Red Hat Enterprise Linux 8.4, it is now possible to collect a set of sos reports from a number of nodes. Refer to how do I collect an sos report from multiple nodes at the same time for different kind of nodes apart of OpenShift ones.

Prerequisites

Collect the sos reports


When kubeconfig file is available


If the `oc` binary is installed and connected to an OpenShift cluster, it is possible to collect the sos report for all the master nodes running the following command (it is possible to remove the `--all-logs` parameter if the generated sos report is too big, but note several sos reports from several nodes are collected this way, so the size will be big):
$ sos collect --cluster-type ocp --no-local \
  -e openshift -e openshift_ovn -e openvswitch -e podman -e crio \
  -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on -k networking.ethtool-namespaces=off --all-logs --plugin-timeout=600

Note: for collecting sos report from specific nodes, it is possible to use the --nodes= parameter with a comma separated list with the names of the nodes (e.g.: --nodes=openshift-worker-1.example.com,openshift-worker-0.example.com").

When kubeconfig file is not available, or a node is not yet part of the OpenShift cluster


If the `kubeconfig` file is not available, is it also possible to provide additional parameters to the `sos collect` command to allow the connection through SSH: the user to connect to the nodes, the SSH key, and even the list of the nodes to collect the sos report (separated by comma):
$ sos collect --case-id 00009999 --no-local \
  --nopasswd-sudo \
  --ssh-user core -i ~/.ssh/openshift.key \
  --nodes=openshift-worker-1.example.com,openshift-worker-0.example.com,openshift-master-0.example.com,openshift-master-1.example.com,openshift-master-2.example.com \
  -e openshift -e openshift_ovn -e openvswitch -e podman -e crio \
  -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on -k networking.ethtool-namespaces=off --all-logs --plugin-timeout=600

Note: Replace items in the above command as needed. If any of the plugins times out, or not all the information is collected, it could be needed to add the paramenter --plugin-timeout=600 to increase the plugin timeout.

Example


An output similar to the following one will be shown:
sos-collector (version 4.2)

This utility is used to collect sos reports from multiple nodes simultaneously.
Remote connections are made and/or maintained to those nodes via well-known
transport protocols such as SSH.

An archive of sos report tarballs collected from the nodes will be generated in
/var/tmp/sos.igt7ewr5 and may be provided to an appropriate support
representative.

The generated archive may contain data considered sensitive and its content
should be reviewed by the originating organization before being passed to any
third party.

No configuration changes will be made to the system running this utility or
remote systems that it connects to.


Press ENTER to continue, or CTRL-C to quit



sos-collector ASSUMES that SSH keys are installed on all nodes unless the
--password option is provided.


Connected to openshift-master-0.example.com, determining cluster type...

The following is a list of nodes to collect from:
	openshift-master-0.example.com
	openshift-master-1.example.com
	openshift-master-2.example.com
	openshift-worker-0.example.com
	openshift-worker-1.example.com


Press ENTER to continue with these nodes, or press CTRL-C to quit



Connecting to nodes...

Beginning collection of sosreports from 5 nodes, collecting a maximum of 4 concurrently

openshift-worker-1              : Generating sosreport...
openshift-master-0              : Generating sosreport...
openshift-worker-0              : Generating sosreport...
openshift-master-2              : Generating sosreport...
openshift-worker-0              : Retrieving sosreport...
openshift-worker-0              : Successfully collected sosreport
openshift-master-1              : Generating sosreport...
openshift-worker-1              : Retrieving sosreport...
openshift-worker-1              : Successfully collected sosreport
openshift-master-2              : Retrieving sosreport...
openshift-master-2              : Successfully collected sosreport
openshift-master-0              : Retrieving sosreport...
openshift-master-0              : Successfully collected sosreport
openshift-master-1              : Retrieving sosreport...
openshift-master-1              : Successfully collected sosreport

The following archive has been created. Please provide it to your support team.
	/var/tmp/sos-collector-00009999-2021-05-31-fuzyr.tar.xz

Root Cause

With the new sos tool, which is provided since Red Hat Enterprise Linux 8.4, it is possible to collect a set of sos reports from a number of nodes.

Diagnostic Steps

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

  • Several issues exist in previous versions of sos, so the recommendation is to update sos:

    # yum update sos
    
    # dnf update sos
    
  • Check the sos collect help for additional parameters:

    $ sos collect --help
    
  • If the update is not possible and the following message is shown:

    [openshift-worker-1:create_sos_container] Could not start container after create: 
    Error: no container with name or ID sos-collector-tmp found: no such container
    

    As explained in Content from github.com is not included."OpenShift 4 - sos collect fails because podman cannot pull support-tools container due to missing credentials", pull the support-tools image manually via podman on all nodes with the following command:

    # podman pull --authfile /var/lib/kubelet/config.json registry.redhat.io/rhel8/support-tools
    
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.