OVS and SDN Pods in CrashLoopbackOff after upgrade to OpenShift 4.6

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform
    • 4.6.1 -> 4.6.6

Issue

  • Some OVS Pods in openshift-sdn namespace are in CrashLoopbackOff after upgrade to OpenShift 4.6.
# oc get pods -n openshift-sdn -o wide
ovs-gppl4              0/1     CrashLoopBackOff   6          43h   10.20.196.191   node01.example.com   <none>           <none>
sdn-97njs              0/2     CrashLoopBackOff   6          43h   10.20.196.191   node01.example.com   <none>           <none>

Resolution

This issue is reported as resolved in OpenShift 4.6.12 which included This content is not included.a fix for a kernel entropy related issue.

This issue was reviewed by the OpenShift engineering team in This content is not included.Bugzilla 1895024.

If you are seeing this issue after a reboot of a node, you can check if the openvswitch and ovs-configuration services are running. If they are not then you can manually restart them to workaround the issue.

DO NOT DO THIS DURING AN UPGRADE ON A NODE
Check if your node has booted into new osImage first. If its at a "CoreOS 4.6..." version then proceed with restarting services:

$ oc get node worker-0.example.com -o template='{{.status.nodeInfo.osImage}}{{"\n"}}'
Red Hat Enterprise Linux CoreOS 46.82.202011111640-0 (Ootpa)

$ systemctl status openvswitch.service
$ systemctl status ovs-configuration.service
$ systemctl restart openvswitch.service
$ systemctl restart ovs-configuration.service

Root Cause

  • At this point, research into this issue shows that the OVS related services are timing out during startup and fail to recover after this.
  • A possible root cause is low entropy on the system.

Diagnostic Steps

  • Check if OVS related services are running using the following systemctl command or check sos_commands/systemd inside of a Sosreport:
$ systemctl status ovs-configuration openvswitch ovsdb-server ovs-vswitchd
  • Check if OVS related services failed during startup using the following journactl command or checking sos_commands/logs inside of a Sosreport:
$ journactl --no-pager | egrep 'systemd.*? (ovsdb-server|openvswitch|ovs-configuration|ovs-vswitchd).service'
Dec 05 01:21:04 node01.example.com systemd[1]: openvswitch.service: Consumed 2ms CPU time
Dec 05 01:21:04 node01.example.com systemd[1]: ovs-vswitchd.service: Consumed 1min 58.508s CPU time
Dec 05 01:21:05 node01.example.com systemd[1]: ovsdb-server.service: Consumed 13.805s CPU time
Dec 05 01:23:24 localhost systemd[1]: ovsdb-server.service: Start operation timed out. Terminating.
Dec 05 01:24:06 localhost systemd[1]: ovsdb-server.service: Failed with result 'timeout'.
Dec 05 01:24:06 localhost systemd[1]: ovs-configuration.service: Job ovs-configuration.service/start failed with result 'dependency'.
Dec 05 01:24:06 localhost systemd[1]: openvswitch.service: Job openvswitch.service/start failed with result 'dependency'.
Dec 05 01:24:06 localhost systemd[1]: ovs-vswitchd.service: Job ovs-vswitchd.service/start failed with result 'dependency'.
Dec 05 01:24:06 localhost systemd[1]: ovsdb-server.service: Consumed 189ms CPU time
Dec 05 01:24:07 localhost systemd[1]: ovsdb-server.service: Service RestartSec=100ms expired, scheduling restart.
Dec 05 01:24:07 localhost systemd[1]: ovsdb-server.service: Scheduled restart job, restart counter is at 1.
Dec 05 01:24:07 localhost systemd[1]: ovsdb-server.service: Consumed 0 CPU time
  • Check if system has low entropy in the following file (file location is the same in Sosreport). Note that low entropy is difficult to determine exactly but This content is not included.if entropy lower than 1,000 that's generally accepted as "low" and could lead to process hangs:
$ cat /proc/sys/kernel/random/entropy_avail 
815
SBR
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.