How do I collect an sos report from multiple nodes at the same time?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 7, 8, 9
  • Red Hat Enterprise Linux High Availability
  • Pacemaker
  • Red Hat Virtualization 4
  • OpenShift Container Platform

Issue

  • How can multiple sos reports be generated simultaneously?
  • I have a multi-node or clustered environment, how can I collect an sos report from each node?

Resolution

  • Starting with RHEL 7.6, the sos-collector package has been added to the base repository to facilitate easier collection of sos reports from multiple nodes. This allows system administrators to execute a single command that generates a single archive containing sosreports from the desired nodes.
    • Note: sos-collector in RHEL 7.6 makes use of the python-paramiko SSH library. This may not be suitable for secure environments and you should verify that it is acceptable to use in your environment before using sos-collector.
    • In RHEL 8 and later, sos-collector does not use the python-paramiko library.
    • The sos-collector package does not need to be installed on every node. It only needs to be installed on the node it is being run from.
# yum install sos-collector
  • As of RHEL 8.4 and later, sos-collector is now part of the standard sos package. The sos-collector package is obsoleted by sos-4.0 and later.
# yum install sos

Usage


For collecting sosreports from several nodes from an OpenShift 4 cluster, refer to [how to create and gather sosreports in bulk for an OpenShift 4 cluster](https://access.redhat.com/solutions/6087311).
  • Note: Prior to RHEL 8.4, the command to use is sos-collector. However, as of RHEL 8.4 and later (sos version 4.0+), the command is sos collect, as this functionality was integrated directly into the sos utility.

sos collect may be run either from a node in the environment you wish to collect from, or from a workstation. The only requirement for running is that the system that runs sos collect has network connectivity to each node that an sos report will be collected from.

By itself sos collect does not need to be run as the root user, however it will need to become the root user, or use sudo, on the remote nodes to generate the actual sos report.

Running in the environment


Running `sos collect` on a "primary" node (a node capable of listing the other nodes in the cluster) is straight-forward as cluster-type detection is done automatically:
 # sos collect
[...]
Cluster type set to rhv

The following is a list of nodes to collect from:
	rhev-manager.example.com
	rhev-hypervisor1.example.com
    rhev-hypervisor2.example.com

From here, sos collect will open an SSH session to each remote node and run an sos report command on each of them. The sos report command run on each node is controlled by the user, the technology in use (RHV, pacemaker, etc...) and the node's individual sosreport capabilities (in the case of varying versions of sos).

Running from a local workstation


As long as a system has network access to the nodes, it can be used to run `sos collect` and provide the final archive of sosreports on the local node. To do so, use the `--primary` option to specify a node that is capable of enumerating the other nodes in a cluster:
 # sos collect --primary=rhev-manager.example.com
  • Note: prior to sos-4.2, the --primary option was called --master.

Limiting the nodes to collect from


When `sos collect` connects to, or is run on, a primary node it collects a full list of all nodes in the environment. To limit which nodes are collected from, use the `--nodes` option. This option can take either hostnames, IP address, or shell-style regexes to identify nodes. For example, the following is a perfectly valid way to limit collection to two specific nodes (rhev1 and 10.10.10.10) and any node matching the `rhev-h*` regex:
# sos collect --nodes=rhev1.example.com,rhev-h*.example.com,10.10.10.10

Specifying a static list of nodes


If cluster determination or node enumeration from the cluster is failing, or to simply use an arbitrary list of nodes to collect from, the `--nodes` option may also be used in this way. Users may also use the `--cluster-type` option to skip querying the cluster software if desired:
# sos collect --nodes=foo.example.com,bar.example.com,10.10.10.10 --cluster-type=none

When using the --nodes option in this way, regex/glob notations are not supported. Use of regexes or globs for the --nodes option is only valid when restricting a list of addresses returned by a cluster.

SSH keys, passwords, and users


`sos collect` defaults to attempting to use SSH-keys to connect, and by default all keys in the running user's keyring are tried.

To use a password to connect to the nodes, use the --password option. This will then prompt the user for the SSH password. It is assumed that this password is valid for all nodes discovered; if this is not the case, users may also use the --password-per-node option to have the utility individually prompt for each node's password. Use of SSH keys is strongly recommended.

Additionally, by default the root user is used for SSH sessions since sos report must run as root. Users can change this with the --ssh-user option. Doing so will cause sos collect to prompt for a sudo password for the specified user, unless the --become option is also specified, in which case you will be prompted for the root user password on the remote nodes. For remote users that have passwordless sudo configured, use the --nopasswd-sudo option to skip these prompts.

Using Cluster Options


`sos collect` supports multiple types of clusters, and options for those clusters that affect how `sos collect` will function. You can view the available options using `sos collect -l`:
# sos collect -l

The following cluster options are available:

Cluster         Option Name     Type       Default    Description
rhv             no-database     bool       False      Do not collect a database dump
rhv             cluster         str                   Only collect from hosts in this cluster
rhv             datacenter      str                   Only collect from hosts in this datacenter
rhv             no-hypervisors  bool       False      Do not collect from hypervisors
pacemaker       online          bool       True       Collect nodes listed as online
pacemaker       offline         bool       True       Collect nodes listed as offline
openshift       label           str                   Filter node list to those with matching label
openshift       role            str                   Filter node list to those with matching role

These options can be enabled using the syntax -c cluster_name.option_name=value. For example, to disable database collection for a RHV environment, use the following:

    # sos collect -c rhv.no-database=True

Modifying the sosreport command on remote nodes


`sos collect` has tight integration with the `sos report` command, and most options that can be used for `sos report` can be used in the exact same manner for `sos collect`. For instance, running `sos collect -e ovirt -n logs` will in turn enable the ovirt plugin and disable the logs plugin on each node.

sos collect also supports sos plugin options using the same format - E.G. -k plugin_name.option_name=value.

Additionally, sos collect is capability aware on a per-node basis. This means that in the event of an invalid sos report option being specified (for example, when there are varying versions of sos installed on the various nodes) will not prevent the collection of an sos report from the node - the invalid options are simply filtered out.

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.