What data to provide if sos report fails in OpenShift Container Platform 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
Issue
- Sos report hangs.
- Sos report will not run to completion on system.
- Sos report command is not working.
- How to create a sos report manually.
- Need to provide the information related to the system without sos report.
- How do I collect data if sos report is not installed?
- I'm attempting to run a sos report of the server, but it appears to be hanging.
- How to collect the required logs from the server without running sos report.
- I want to capture the system configuratiom periodically without sos report.
Resolution
The following information has been provided by Red Hat, but is outside the scope of the posted This content is not included.Service Level Agreements and support procedures (Production Support - Red Hat Customer Portal)). The information is provided as-is and any configuration settings or installed applications made from the information in this article could make the Operating System unsupported by Red Hat Global Support Services. The intent of this article is to provide information to accomplish the system's needs. Use of the information in this article at the user's own risk.
NOTE: The following script is not an official tool to collect information. The script is not supported and not maintained. Users can use the script for their own needs. The collected data with the script is not a subset of sos report, and could not be used instead of data collected withsos reportcommand provided bysospackage.
Also check how to provide an sos report from a RHEL CoreOS OpenShift 4 node.
The following instructions will collect data within nodes without SSH in OCP 4.
- Some of the data that is collected by sos report can alternatively be collected using following script, it may take up to 5-10 minutes to run, depending on the size of the logs.
-
First, display the list of nodes in the cluster:
$ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-131-87.eu-west-3.compute.internal Ready master 119d v1.14.6+8e46c0036 ip-10-0-132-143.eu-west-3.compute.internal Ready worker 119d v1.14.6+8e46c0036 ip-10-0-145-113.eu-west-3.compute.internal Ready master 119d v1.14.6+8e46c0036 ip-10-0-147-108.eu-west-3.compute.internal Ready worker 119d v1.14.6+8e46c0036 ip-10-0-161-51.eu-west-3.compute.internal Ready master 119d v1.14.6+8e46c0036 ip-10-0-163-177.eu-west-3.compute.internal Ready worker 119d v1.14.6+8e46c0036 -
Then, create a debug session with
oc debug node/<node name>, in this caseoc debug node/ip-10-0-132-143.eu-west-3.compute.internal. The debug session will spawn a pod using imageregistry.redhat.io/rhel7/support-tools:$ oc debug node/ip-10-0-132-143.eu-west-3.compute.internal Starting pod/ip-10-0-132-143eu-west-3computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.132.143 If you don't see a command prompt, try pressing enter. sh-4.2# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.7 (Maipo) sh-4.2# -
Once in the debug session, one can use
chrootto change the apparent root directory to the one of the underlying host:sh-4.2# chroot /host bash [root@ip-10-0-132-143 /]# cat /etc/redhat-release Red Hat Enterprise Linux CoreOS release 4.2 [root@ip-10-0-132-143 /]# # Change to root home dir to create file on the next step [root@ip-10-0-132-143 /]# cd ~ -
Create a file named
man_sosreport.sh, copy the script given below and save it in theman_sosreport.shfile.#!/bin/bash export LANG=C # If this script hangs, un-comment the below two entries and note the command that the script hangs on. Then comment out that command and re-run the script. # set -x # set -o verbose [[ -d /tmp/sosreport ]] && rm -rf /tmp/sosreport mkdir /tmp/sosreport && cd /tmp/sosreport && mkdir -p var/log etc/lvm etc/sysconfig network storage sos_commands/networking echo -e "Gathering system information..." hostname &> hostname cp -a /etc/redhat-release ./etc/ 2>> error_log uptime &> uptime echo -e "Gathering application information..." chkconfig --list &> chkconfig top -bn1 &> top_bn1 service --status-all &> service_status_all date &> date ps auxww &> ps_auxww ps -elf &> ps_-elf rpm -qa --last &> rpm-qa systemctl status --all --no-pager &> systemctl_status_-all echo -e "Running 'rpm -Va'. This may take a moment." rpm -Va &> rpm-Va echo -e "Gathering memory information..." free -m &> free vmstat 1 10 &> vmstat echo -e "Gathering network information..." ifconfig &> ./network/ifconfig netstat -s &>./network/netstat_-s netstat -agn &> ./network/netstat_-agn netstat -neopa &> ./network/netstat_-neopa route -n &> ./network/route_-n for i in $(ls /etc/sysconfig/network-scripts/{ifcfg,route,rule}-*) ; do echo -e "$i\n----------------------------------"; cat $i;echo " "; done &> ./sos_commands/networking/ifcfg-files for i in $(ifconfig | grep "^[a-z]" | cut -f 1 -d " "); do echo -e "$i\n-------------------------" ; ethtool $i; ethtool -k $i; ethtool -S $i; ethtool -i $i;echo -e "\n" ; done &> ./sos_commands/networking/ethtool.out cp /etc/sysconfig/network ./sos_commands/networking/ 2>> error_log cp /etc/sysconfig/network-scripts/ifcfg-* ./sos_commands/networking/ 2>> error_log cp /etc/sysconfig/network-scripts/route-* ./sos_commands/networking/ 2>> error_log cat /proc/net/bonding/bond* &> ./sos_commands/networking/proc-net-bonding-bond 2>> error_log iptables --list --line-numbers &> ./sos_commands/networking/iptables_--list_--line-numbers ip route show table all &> ./sos_commands/networking/ip_route_show_table_all ip link &> ./sos_commands/networking/ip_link echo -e "Gathering Storage/Filesystem information..." df -l &> df fdisk -l &> fdisk parted -l &> parted cp -a /etc/fstab ./etc/ 2>> error_log cp -a /etc/lvm/lvm.conf ./etc/lvm/ 2>> error_log cp -a /etc/lvm/backup/ ./etc/lvm/ 2>> error_log cp -a /etc/lvm/archive/ ./etc/lvm/ 2>> error_log cp -a /etc/multipath.conf ./etc/ 2>> error_log cat /proc/mounts &> mount iostat -tkx 1 10 &> iostat_-tkx_1_10 parted -l &> storage/parted_-l vgdisplay -v &> storage/vgdisplay lvdisplay &> storage/lvdisplay pvdisplay &> storage/pvdisplay pvs -a -v &> storage/pvs vgs -v &> storage/vgs lvs -o +devices &> storage/lvs multipath -v4 -ll &> storage/multipath_ll pvscan -vvvv &> storage/pvscan vgscan -vvvv &> storage/vgscan lvscan -vvvv &> storage/lvscan lsblk &> storage/lsblk lsblk -t &> storage/lsblk_t dmsetup info -C &> storage/dmsetup_info_c dmsetup status &> storage/dmsetup_status dmsetup table &> storage/dmsetup_table lsscsi ls -lahR /dev &> storage/dev echo -e "Gathering kernel information..." cp -a /etc/security/limits.conf ./etc/ 2>> error_log cp -a /etc/sysctl.conf ./etc/ 2>> error_log ulimit -a &> ulimit cat /proc/slabinfo &> slabinfo cat /proc/zoneinfo &> zoneinfo cat /proc/interrupts &> interrupts cat /proc/iomem &> iomem cat /proc/ioports &> ioports slabtop -o &> slabtop_-o uname -a &> uname sysctl -a &> sysctl_-a lsmod &> lsmod cp -a /etc/modprobe.conf ./etc/ 2>> error_log cp -a /etc/sysconfig/* ./etc/sysconfig/ 2>> error_log for MOD in `lsmod | grep -v "Used by"| awk '{ print $1 }'`; do modinfo $MOD 2>&1 >> modinfo; done; ipcs -a &> ipcs_-a ipcs -s | awk '/^0x/ {print $2}' | while read semid; do ipcs -s -i $semid; done &> ipcs_-s_verbose sar -A &> sar_-A cp -a /var/log/dmesg dmesg 2>> error_log dmesg &> dmesg_now echo -e "Gathering hardware information..." dmidecode &> dmidecode lspci -vvv &> lspci_-vvv lspci &> lspci cat /proc/meminfo &> meminfo cat /proc/cpuinfo &> cpuinfo echo -e "Gathering kdump information..." cp -a /etc/kdump.conf ./etc/ 2>> error_log ls -laR /var/crash &> ls-lar-var-crash ls -1 /var/crash | while read n; do mkdir -p var/crash/${n}; cp -a /var/crash/${n}/vmcore-dmesg* var/crash/${n}/ 2>> error_log; done echo -e "Gathering container related information..." mkdir openshift journalctl --no-pager --unit kubelet &> openshift/journalctl_--no-pager_--unit_kubelet systemctl status kubelet --no-pager &> openshift/systemctl_status_kubelet echo -e "Gathering container related information..." mkdir {container,container/container_info} journalctl --no-pager --unit crio &> container/journalctl_--no-pager_--unit_crio systemctl status crio --no-pager &> container/systemctl_status_crio crio config &> container/crio_config crictl stats &> container/stats crictl info &> container/info crictl pods &> container/pods crictl pods -v &> container/pods_-v crictl ps -a &> container/ps_-a crictl image &> container/image_list crictl ps -a | awk '$1!="CONTAINER" {print $1}' | while read id; do mkdir container/container_info/$id; done crictl ps -a | awk '$1!="CONTAINER" {print $1}' | while read id; do crictl inspect $id &> container/container_info/$id/${id}_inspect; done crictl ps -a | awk '$1!="CONTAINER" {print $1}' | while read id; do crictl logs $id &> container/container_info/$id/${id}_logs; done echo -e "Gathering logs..." cp -a /var/log/{containers*,pods*,message*,secure*,boot*,cron*,yum*,Xorg*,sa,rhsm,audit,dmesg} ./var/log/ 2>> error_log cp -a /etc/*syslog.conf ./etc/ 2>> error_log echo -e "Compressing files..." tar -cjf /tmp/sosreport.tar.bz2 ./ echo -e "Script complete." -
Run the script.
[root@ip-10-0-132-143 /]# chmod +x man_sosreport.sh [root@ip-10-0-132-143 /]# ./man_sosreport.sh -
Attach
/tmp/sosreport.tar.bz2to the support case.
-
Root Cause
Sos report is failing.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.