How to troubleshoot CRI-O and gather a CRI-O go routine stack

Solution Verified - Updated 13 Jun 2024

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 4

Issue

containers not getting created/deleted
crictl commands not responding
CRI-O using much more memory than usual
how to gather a CRI-O go routine stack

Resolution

In general it is good to start with a baseline, collect an sosreport from the node having issues. It will contain a general health of the node, journal logs (which include the crio logs) and the service statuses. See gather an sosreport from the node
Having the sosreport should help, but in some cases we will need further data.
Setting debug logging for cri-o will generate a lot more logs which should help pointing out issues if they arise. Note that this restarts the process, so may hide the current problem. See How to configure CRI-O logLevel in OpenShift 4
If cri-o is not doing certain operations or using a lot more memory than usual, it may have go routines not completing, but Cri-o will still be responsive. Support may request that the operator will need to execute on the node the following commands to print the go routine stacks. This will not "kill" the process, only send a USR1 signal to the process.

kill -USR1 $crio-pid
systemctl kill -s USR1 crio.service

CRI-O will catch the signal, and write the routine stacks to /tmp/crio-goroutine-stacks-${timestamp}.log
Attach the file onto the case/bugzilla/issue.

If the process is entirely non-responsive, then it may require attaching a strace or a cri-o Coredump.

SBR

Shift

Product(s)

Red Hat OpenShift Container Platform

Components

cri-o

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.