How to create and gather sos reports in bulk for an OpenShift 4 cluster?
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Red Hat Enterprise Linux (RHEL) (as jumpserver)
- 8.4+
- 9
sos- 4
Issue
- Is it possible to create and gather sos reports in bulk for an OCP 4 cluster?
- How to collect sos report from several OpenShift 4 nodes at the same time?
Resolution
With the new sos 4 tool, which is provided starting with Red Hat Enterprise Linux 8.4, it is now possible to collect a set of sos reports from a number of nodes. Refer to how do I collect an sos report from multiple nodes at the same time for different kind of nodes apart of OpenShift ones.
Prerequisites
- A RHEL 8.4 (or newer), or RHEL 9 jumpserver or bastion host which can connect to all worker nodes via SSH and/or which holds the
kubeconfigfile and has the This page is not included, but the link has been rewritten to point to the nearest parent document.ocCLI installed and configured. - Up-to-date
sosversion.
Collect the sos reports
When kubeconfig file is available
If the `oc` binary is installed and connected to an OpenShift cluster, it is possible to collect the sos report for all the master nodes running the following command (it is possible to remove the `--all-logs` parameter if the generated sos report is too big, but note several sos reports from several nodes are collected this way, so the size will be big):
$ sos collect --cluster-type ocp --no-local \
-e openshift -e openshift_ovn -e openvswitch -e podman -e crio \
-k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on -k networking.ethtool-namespaces=off --all-logs --plugin-timeout=600
Note: for collecting sos report from specific nodes, it is possible to use the
--nodes=parameter with a comma separated list with the names of the nodes (e.g.:--nodes=openshift-worker-1.example.com,openshift-worker-0.example.com").
When kubeconfig file is not available, or a node is not yet part of the OpenShift cluster
If the `kubeconfig` file is not available, is it also possible to provide additional parameters to the `sos collect` command to allow the connection through SSH: the user to connect to the nodes, the SSH key, and even the list of the nodes to collect the sos report (separated by comma):
$ sos collect --case-id 00009999 --no-local \
--nopasswd-sudo \
--ssh-user core -i ~/.ssh/openshift.key \
--nodes=openshift-worker-1.example.com,openshift-worker-0.example.com,openshift-master-0.example.com,openshift-master-1.example.com,openshift-master-2.example.com \
-e openshift -e openshift_ovn -e openvswitch -e podman -e crio \
-k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on -k networking.ethtool-namespaces=off --all-logs --plugin-timeout=600
Note: Replace items in the above command as needed. If any of the plugins times out, or not all the information is collected, it could be needed to add the paramenter
--plugin-timeout=600to increase the plugin timeout.
Example
An output similar to the following one will be shown:
sos-collector (version 4.2)
This utility is used to collect sos reports from multiple nodes simultaneously.
Remote connections are made and/or maintained to those nodes via well-known
transport protocols such as SSH.
An archive of sos report tarballs collected from the nodes will be generated in
/var/tmp/sos.igt7ewr5 and may be provided to an appropriate support
representative.
The generated archive may contain data considered sensitive and its content
should be reviewed by the originating organization before being passed to any
third party.
No configuration changes will be made to the system running this utility or
remote systems that it connects to.
Press ENTER to continue, or CTRL-C to quit
sos-collector ASSUMES that SSH keys are installed on all nodes unless the
--password option is provided.
Connected to openshift-master-0.example.com, determining cluster type...
The following is a list of nodes to collect from:
openshift-master-0.example.com
openshift-master-1.example.com
openshift-master-2.example.com
openshift-worker-0.example.com
openshift-worker-1.example.com
Press ENTER to continue with these nodes, or press CTRL-C to quit
Connecting to nodes...
Beginning collection of sosreports from 5 nodes, collecting a maximum of 4 concurrently
openshift-worker-1 : Generating sosreport...
openshift-master-0 : Generating sosreport...
openshift-worker-0 : Generating sosreport...
openshift-master-2 : Generating sosreport...
openshift-worker-0 : Retrieving sosreport...
openshift-worker-0 : Successfully collected sosreport
openshift-master-1 : Generating sosreport...
openshift-worker-1 : Retrieving sosreport...
openshift-worker-1 : Successfully collected sosreport
openshift-master-2 : Retrieving sosreport...
openshift-master-2 : Successfully collected sosreport
openshift-master-0 : Retrieving sosreport...
openshift-master-0 : Successfully collected sosreport
openshift-master-1 : Retrieving sosreport...
openshift-master-1 : Successfully collected sosreport
The following archive has been created. Please provide it to your support team.
/var/tmp/sos-collector-00009999-2021-05-31-fuzyr.tar.xz
Root Cause
With the new sos tool, which is provided since Red Hat Enterprise Linux 8.4, it is possible to collect a set of sos reports from a number of nodes.
Diagnostic Steps
Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.
-
Several issues exist in previous versions of
sos, so the recommendation is to updatesos:# yum update sos# dnf update sos -
Check the
sos collecthelp for additional parameters:$ sos collect --help -
If the update is not possible and the following message is shown:
[openshift-worker-1:create_sos_container] Could not start container after create: Error: no container with name or ID sos-collector-tmp found: no such containerAs explained in Content from github.com is not included."OpenShift 4 - sos collect fails because podman cannot pull support-tools container due to missing credentials", pull the
support-toolsimagemanually viapodmanon all nodes with the following command:# podman pull --authfile /var/lib/kubelet/config.json registry.redhat.io/rhel8/support-tools
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.