How do I collect an sos report from multiple nodes at the same time?
Environment
- Red Hat Enterprise Linux 7, 8, 9
- Red Hat Enterprise Linux High Availability
- Pacemaker
- Red Hat Virtualization 4
- OpenShift Container Platform
Issue
- How can multiple sos reports be generated simultaneously?
- I have a multi-node or clustered environment, how can I collect an sos report from each node?
Resolution
- Starting with RHEL 7.6, the
sos-collectorpackage has been added to the base repository to facilitate easier collection of sos reports from multiple nodes. This allows system administrators to execute a single command that generates a single archive containing sosreports from the desired nodes.- Note: sos-collector in RHEL 7.6 makes use of the
python-paramikoSSH library. This may not be suitable for secure environments and you should verify that it is acceptable to use in your environment before using sos-collector. - In RHEL 8 and later, sos-collector does not use the
python-paramikolibrary. - The
sos-collectorpackage does not need to be installed on every node. It only needs to be installed on the node it is being run from.
- Note: sos-collector in RHEL 7.6 makes use of the
# yum install sos-collector
- As of RHEL 8.4 and later,
sos-collectoris now part of the standardsospackage. Thesos-collectorpackage is obsoleted bysos-4.0and later.
# yum install sos
Usage
For collecting sosreports from several nodes from an OpenShift 4 cluster, refer to [how to create and gather sosreports in bulk for an OpenShift 4 cluster](https://access.redhat.com/solutions/6087311).
- Note: Prior to RHEL 8.4, the command to use is
sos-collector. However, as of RHEL 8.4 and later (sosversion 4.0+), the command issos collect, as this functionality was integrated directly into thesosutility.
sos collect may be run either from a node in the environment you wish to collect from, or from a workstation. The only requirement for running is that the system that runs sos collect has network connectivity to each node that an sos report will be collected from.
By itself sos collect does not need to be run as the root user, however it will need to become the root user, or use sudo, on the remote nodes to generate the actual sos report.
Running in the environment
Running `sos collect` on a "primary" node (a node capable of listing the other nodes in the cluster) is straight-forward as cluster-type detection is done automatically:
# sos collect
[...]
Cluster type set to rhv
The following is a list of nodes to collect from:
rhev-manager.example.com
rhev-hypervisor1.example.com
rhev-hypervisor2.example.com
From here, sos collect will open an SSH session to each remote node and run an sos report command on each of them. The sos report command run on each node is controlled by the user, the technology in use (RHV, pacemaker, etc...) and the node's individual sosreport capabilities (in the case of varying versions of sos).
Running from a local workstation
As long as a system has network access to the nodes, it can be used to run `sos collect` and provide the final archive of sosreports on the local node. To do so, use the `--primary` option to specify a node that is capable of enumerating the other nodes in a cluster:
# sos collect --primary=rhev-manager.example.com
- Note: prior to
sos-4.2, the--primaryoption was called--master.
Limiting the nodes to collect from
When `sos collect` connects to, or is run on, a primary node it collects a full list of all nodes in the environment. To limit which nodes are collected from, use the `--nodes` option. This option can take either hostnames, IP address, or shell-style regexes to identify nodes. For example, the following is a perfectly valid way to limit collection to two specific nodes (rhev1 and 10.10.10.10) and any node matching the `rhev-h*` regex:
# sos collect --nodes=rhev1.example.com,rhev-h*.example.com,10.10.10.10
Specifying a static list of nodes
If cluster determination or node enumeration from the cluster is failing, or to simply use an arbitrary list of nodes to collect from, the `--nodes` option may also be used in this way. Users may also use the `--cluster-type` option to skip querying the cluster software if desired:
# sos collect --nodes=foo.example.com,bar.example.com,10.10.10.10 --cluster-type=none
When using the --nodes option in this way, regex/glob notations are not supported. Use of regexes or globs for the --nodes option is only valid when restricting a list of addresses returned by a cluster.
SSH keys, passwords, and users
`sos collect` defaults to attempting to use SSH-keys to connect, and by default all keys in the running user's keyring are tried.
To use a password to connect to the nodes, use the --password option. This will then prompt the user for the SSH password. It is assumed that this password is valid for all nodes discovered; if this is not the case, users may also use the --password-per-node option to have the utility individually prompt for each node's password. Use of SSH keys is strongly recommended.
Additionally, by default the root user is used for SSH sessions since sos report must run as root. Users can change this with the --ssh-user option. Doing so will cause sos collect to prompt for a sudo password for the specified user, unless the --become option is also specified, in which case you will be prompted for the root user password on the remote nodes. For remote users that have passwordless sudo configured, use the --nopasswd-sudo option to skip these prompts.
Using Cluster Options
`sos collect` supports multiple types of clusters, and options for those clusters that affect how `sos collect` will function. You can view the available options using `sos collect -l`:
# sos collect -l
The following cluster options are available:
Cluster Option Name Type Default Description
rhv no-database bool False Do not collect a database dump
rhv cluster str Only collect from hosts in this cluster
rhv datacenter str Only collect from hosts in this datacenter
rhv no-hypervisors bool False Do not collect from hypervisors
pacemaker online bool True Collect nodes listed as online
pacemaker offline bool True Collect nodes listed as offline
openshift label str Filter node list to those with matching label
openshift role str Filter node list to those with matching role
These options can be enabled using the syntax -c cluster_name.option_name=value. For example, to disable database collection for a RHV environment, use the following:
# sos collect -c rhv.no-database=True
Modifying the sosreport command on remote nodes
`sos collect` has tight integration with the `sos report` command, and most options that can be used for `sos report` can be used in the exact same manner for `sos collect`. For instance, running `sos collect -e ovirt -n logs` will in turn enable the ovirt plugin and disable the logs plugin on each node.
sos collect also supports sos plugin options using the same format - E.G. -k plugin_name.option_name=value.
Additionally, sos collect is capability aware on a per-node basis. This means that in the event of an invalid sos report option being specified (for example, when there are varying versions of sos installed on the various nodes) will not prevent the collection of an sos report from the node - the invalid options are simply filtered out.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.