Troubleshooting OpenShift Container Platform 4.x: Storage

Updated

This article is part of the OpenShift Container Platform 4.X troubleshooting series

Index

Basic data to gather and questions to ask:

  • Depending on the storage provisioner used gather the provisioner pod logs in the storage provider project.
$ oc adm inspect ns/<project-name>
  • A cluster must-gather.

    • If there are attach/detach controller events, check the kube-controller and kube-apiserver logs.
  • An ocs-must-gather if "Openshift Container Storage" or "Openshift Data Foundation" product

- With OCS version 4.8 or below
  # oc adm must-gather --image=registry.redhat.io/ocs4/ocs-must-gather-rhel8:v4.8 --dest-dir=<directory-name>   
  Note: Replace v4.8 to respective version.
- With ODF version 4.9 or higher.
 # oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.11 --dest-dir=<directory-name> 
  • sosreports from nodes where storage is being mounted / pod(s) being scheduled.

    • Identify client messages (node journal looking for mounting of storage to node)
  • Identify kubelet messages (The kubelet leverages client utilities to mount storage).

$ oc adm node-logs --role=master -u kubelet
$ oc adm node-logs --role=worker -u kubelet 
  • Collect component(s) namespace events from during that time period.

  • Is the issue affecting workloads in all namespaces or only a subset?

Tests

  • Emulate the mounting of the storage. The PV object contains all of the information needed for OCP to pass to the provisioner/cloud provider to mount storage.

  • Perform a touch test from either within the problem pod or from a test pod with the same/similar PVC and PV configured and mounted.
    Commonly used for permission issues to identify whether the configured id/gid has permissions to create a file on the FS path specified in the spec.container.volumeMounts of the pod.

  • dd
    Is a basic tool for testing reading and writing to/from a file. It gives very general performance data about throughput. It typically is only used as a starting point for gauging read/write performance. The dd command reads one block of input and processes it and writes it into an output that is specified. Example: Write an ISO file to a USB drive (/dev/sdb)

    • NOTE: Is is wrote to sbd and not sbdN. The ISO file contains partitions! When the ISO is written to /dev/sdb (a device file), the partition metadata will also be there in the ISO and written to the sdb device. Run ls /dev after running the dd command and it will return sdb1, sdb2 etc..
$ dd if=/path/to/ISO of=/path/to/device 
Ex. $ dd if=/home/username/fedora34.iso of=/dev/sdb bs=512 status=progress

$ dd bs=1M count=400 if=/dev/zero of=test.dd conv=fsync

Tools

  • oc rsync
    In some cases, workloads in OCP are using rsync to read/write data from the backing storage.
    Rsync is special because it uses delta functions that minimize the amount of data being sent across the network.

    • Rsync is a fast and versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specifications of the set of files to be copied. Rsync finds files that need to be transferred using an algorithm (by default) that looks for files that have changed in size or in last-modified time.
      Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file's data does not need to be updated.
    • Very useful for backing up (or migrating) the data from persistent volumes!
  • Collectl
    Gathers disk statistics over time. Not as granular as fio, but provides insights of whether time scoped issues are related to spikes in specific values collected by collectl.

    • Default directory where collectl logs are stored is /var/log/collectl. Do rsh to the pod or node and gather these logs as they are gathered.
  • sysstat/sar
    In RHEL sosreports there is a plaintext sar data in /var/log/sa/sarN. These files are written at midnight, so to inspect the current days plaintext sar data, gather that data using sa1 or sa2. In future Red Hat coreOS releases, sysstat will be configured by default.

Category
Components
Article Type