What data to provide if sos report fails in OpenShift Container Platform 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Issue

  • Sos report hangs.
  • Sos report will not run to completion on system.
  • Sos report command is not working.
  • How to create a sos report manually.
  • Need to provide the information related to the system without sos report.
  • How do I collect data if sos report is not installed?
  • I'm attempting to run a sos report of the server, but it appears to be hanging.
  • How to collect the required logs from the server without running sos report.
  • I want to capture the system configuratiom periodically without sos report.

Resolution

The following information has been provided by Red Hat, but is outside the scope of the posted This content is not included.Service Level Agreements and support procedures (Production Support - Red Hat Customer Portal)). The information is provided as-is and any configuration settings or installed applications made from the information in this article could make the Operating System unsupported by Red Hat Global Support Services. The intent of this article is to provide information to accomplish the system's needs. Use of the information in this article at the user's own risk.
NOTE:  The following script is not an official tool to collect information. The script is not supported and not maintained. Users can use the script for their own needs. The collected data with the script is not a subset of sos report, and could not be used instead of data collected with sos report command provided by sos package.

Also check how to provide an sos report from a RHEL CoreOS OpenShift 4 node.

The following instructions will collect data within nodes without SSH in OCP 4.

  • Some of the data that is collected by sos report can alternatively be collected using following script, it may take up to 5-10 minutes to run, depending on the size of the logs.
    1. First, display the list of nodes in the cluster:

      $ oc get nodes
      NAME                                         STATUS   ROLES    AGE    VERSION
      ip-10-0-131-87.eu-west-3.compute.internal    Ready    master   119d   v1.14.6+8e46c0036
      ip-10-0-132-143.eu-west-3.compute.internal   Ready    worker   119d   v1.14.6+8e46c0036
      ip-10-0-145-113.eu-west-3.compute.internal   Ready    master   119d   v1.14.6+8e46c0036
      ip-10-0-147-108.eu-west-3.compute.internal   Ready    worker   119d   v1.14.6+8e46c0036
      ip-10-0-161-51.eu-west-3.compute.internal    Ready    master   119d   v1.14.6+8e46c0036
      ip-10-0-163-177.eu-west-3.compute.internal   Ready    worker   119d   v1.14.6+8e46c0036
      
    2. Then, create a debug session with oc debug node/<node name>, in this case oc debug node/ip-10-0-132-143.eu-west-3.compute.internal. The debug session will spawn a pod using image registry.redhat.io/rhel7/support-tools:

      $ oc debug node/ip-10-0-132-143.eu-west-3.compute.internal
      Starting pod/ip-10-0-132-143eu-west-3computeinternal-debug ...
      To use host binaries, run `chroot /host`
      Pod IP: 10.0.132.143
      If you don't see a command prompt, try pressing enter.
      sh-4.2# cat /etc/redhat-release 
      Red Hat Enterprise Linux Server release 7.7 (Maipo)
      sh-4.2#
      
    3. Once in the debug session, one can use chroot to change the apparent root directory to the one of the underlying host:

              sh-4.2# chroot /host bash
              [root@ip-10-0-132-143 /]#  cat /etc/redhat-release 
              Red Hat Enterprise Linux CoreOS release 4.2
              [root@ip-10-0-132-143 /]# 
      
              # Change to root home dir to create file on the next step
              [root@ip-10-0-132-143 /]# cd ~
      
    4. Create a file named man_sosreport.sh, copy the script given below and save it in the man_sosreport.sh file.

      #!/bin/bash
      export LANG=C
      
      # If this script hangs, un-comment the below two entries and note the command that the script hangs on.  Then comment out that command and re-run the script.
      # set -x
      # set -o verbose
      
      [[ -d /tmp/sosreport ]] && rm -rf /tmp/sosreport
      mkdir /tmp/sosreport && cd /tmp/sosreport && mkdir -p  var/log etc/lvm etc/sysconfig network storage sos_commands/networking
      
      echo -e "Gathering system information..."
      hostname &> hostname  
      cp -a /etc/redhat-release  ./etc/ 2>> error_log
      uptime &> uptime 
      
      echo -e "Gathering application information..."
      chkconfig --list &> chkconfig
      top -bn1 &> top_bn1
      service --status-all &> service_status_all
      date &> date
      ps auxww &> ps_auxww
      ps -elf &> ps_-elf
      rpm -qa --last &> rpm-qa
      systemctl status --all --no-pager &> systemctl_status_-all
      
      echo -e "Running 'rpm -Va'. This may take a moment."
      rpm -Va &> rpm-Va
      
      echo -e "Gathering memory information..."
      free -m &> free  
      vmstat 1 10 &> vmstat
      
      echo -e "Gathering network information..."
      ifconfig &> ./network/ifconfig  
      netstat -s &>./network/netstat_-s
      netstat -agn &> ./network/netstat_-agn
      netstat -neopa &> ./network/netstat_-neopa
      route -n &> ./network/route_-n
      for i in $(ls /etc/sysconfig/network-scripts/{ifcfg,route,rule}-*) ; do echo -e "$i\n----------------------------------"; cat $i;echo " ";  done &> ./sos_commands/networking/ifcfg-files    
      for i in $(ifconfig | grep "^[a-z]" | cut -f 1 -d " "); do echo -e "$i\n-------------------------" ; ethtool $i; ethtool -k $i; ethtool -S $i; ethtool -i $i;echo -e "\n" ; done &> ./sos_commands/networking/ethtool.out
      cp /etc/sysconfig/network ./sos_commands/networking/ 2>> error_log
      cp /etc/sysconfig/network-scripts/ifcfg-* ./sos_commands/networking/ 2>> error_log
      cp /etc/sysconfig/network-scripts/route-* ./sos_commands/networking/ 2>> error_log
      cat /proc/net/bonding/bond* &> ./sos_commands/networking/proc-net-bonding-bond 2>> error_log
      iptables --list --line-numbers &> ./sos_commands/networking/iptables_--list_--line-numbers
      ip route show table all &> ./sos_commands/networking/ip_route_show_table_all
      ip link &> ./sos_commands/networking/ip_link
      
      echo -e "Gathering Storage/Filesystem information..."
      df -l &> df
      fdisk -l &> fdisk
      parted -l &> parted
      cp -a /etc/fstab  ./etc/ 2>> error_log
      cp -a /etc/lvm/lvm.conf ./etc/lvm/ 2>> error_log
      cp -a /etc/lvm/backup/ ./etc/lvm/ 2>> error_log
      cp -a /etc/lvm/archive/ ./etc/lvm/ 2>> error_log
      cp -a /etc/multipath.conf ./etc/ 2>> error_log
      cat /proc/mounts &> mount  
      iostat -tkx 1 10 &> iostat_-tkx_1_10
      parted -l &> storage/parted_-l
      vgdisplay -v &> storage/vgdisplay
      lvdisplay &> storage/lvdisplay
      pvdisplay &> storage/pvdisplay
      pvs -a -v &> storage/pvs
      vgs -v &> storage/vgs
      lvs -o +devices &> storage/lvs
      multipath -v4 -ll &> storage/multipath_ll
      pvscan -vvvv &> storage/pvscan
      vgscan -vvvv &> storage/vgscan
      lvscan -vvvv &> storage/lvscan
      lsblk &> storage/lsblk
      lsblk -t &> storage/lsblk_t
      dmsetup info -C &> storage/dmsetup_info_c
      dmsetup status &>  storage/dmsetup_status 
      dmsetup table &>  storage/dmsetup_table
      lsscsi
      ls -lahR /dev &> storage/dev
      
      echo -e "Gathering kernel information..."
      cp -a /etc/security/limits.conf ./etc/ 2>> error_log
      cp -a /etc/sysctl.conf ./etc/ 2>> error_log
      ulimit -a &> ulimit
      cat /proc/slabinfo &> slabinfo
      cat /proc/zoneinfo &> zoneinfo
      cat /proc/interrupts &> interrupts 
      cat /proc/iomem &> iomem
      cat /proc/ioports &> ioports
      slabtop -o &> slabtop_-o
      uname -a &> uname
      sysctl -a &> sysctl_-a
      lsmod &> lsmod
      cp -a /etc/modprobe.conf ./etc/ 2>> error_log
      cp -a  /etc/sysconfig/* ./etc/sysconfig/ 2>> error_log
      for MOD in `lsmod | grep -v "Used by"| awk '{ print $1 }'`; do modinfo  $MOD 2>&1 >> modinfo; done;
      ipcs -a &> ipcs_-a
      ipcs -s | awk '/^0x/ {print $2}' | while read semid; do ipcs -s -i $semid; done &> ipcs_-s_verbose
      sar -A &> sar_-A
      cp -a /var/log/dmesg dmesg 2>> error_log
      dmesg &> dmesg_now
      
      echo -e "Gathering hardware information..."
      dmidecode &> dmidecode
      lspci -vvv &> lspci_-vvv
      lspci &> lspci
      cat /proc/meminfo &> meminfo  
      cat /proc/cpuinfo &> cpuinfo
      
      echo -e "Gathering kdump information..."
      cp -a /etc/kdump.conf ./etc/ 2>> error_log
      ls -laR /var/crash &> ls-lar-var-crash
      ls -1 /var/crash | while read n; do mkdir -p var/crash/${n}; cp -a /var/crash/${n}/vmcore-dmesg* var/crash/${n}/ 2>> error_log; done
      
      echo -e "Gathering container related information..."
      mkdir openshift
      journalctl --no-pager --unit kubelet &> openshift/journalctl_--no-pager_--unit_kubelet
      systemctl status kubelet --no-pager &> openshift/systemctl_status_kubelet
      
      echo -e "Gathering container related information..."
      mkdir {container,container/container_info}
      
      journalctl --no-pager --unit crio &> container/journalctl_--no-pager_--unit_crio
      systemctl status crio --no-pager &> container/systemctl_status_crio
      
      crio config &> container/crio_config
      crictl stats &> container/stats
      crictl info &> container/info
      crictl pods &> container/pods
      crictl pods -v &> container/pods_-v
      crictl ps -a &> container/ps_-a
      crictl image &> container/image_list
      
      crictl ps -a | awk '$1!="CONTAINER" {print $1}' | while read id; do mkdir container/container_info/$id; done
      crictl ps -a | awk '$1!="CONTAINER" {print $1}' | while read id; do crictl inspect $id &> container/container_info/$id/${id}_inspect; done
      crictl ps -a | awk '$1!="CONTAINER" {print $1}' | while read id; do crictl logs $id &> container/container_info/$id/${id}_logs; done
      
      echo -e "Gathering logs..."
      cp -a /var/log/{containers*,pods*,message*,secure*,boot*,cron*,yum*,Xorg*,sa,rhsm,audit,dmesg} ./var/log/ 2>> error_log
      cp -a /etc/*syslog.conf ./etc/ 2>> error_log
      
      echo -e "Compressing files..."
      tar -cjf /tmp/sosreport.tar.bz2 ./
      
      echo -e "Script complete."
      
    5. Run the script.

      [root@ip-10-0-132-143 /]# chmod +x man_sosreport.sh
      [root@ip-10-0-132-143 /]# ./man_sosreport.sh
      
    6. Attach /tmp/sosreport.tar.bz2 to the support case.

Root Cause

Sos report is failing.

SBR
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.