Troubleshooting OpenShift Container Platform 4.x: openshift-sdn
Environment
- Red Hat OpenShift Container Platform
- 4.x
Issue
-
I don't seem to have a functioning SDN with my newly created cluster, what data should I collect to investigate the issue?
-
How do I collect the ovs flows to troubleshoot SDN issues?
Diagnostic Steps
-
Before exploring these steps please be sure you have investigated if the openshift-networking-operator is functioning.
-
Also, you may want to raise the openshift-sdn loglevel.
-
On the NotReady node, you need to find out which pods, if any, are in a bad state. Be sure to substitute in the correct spec.nodeName (or just remove it).
$ kubectl -n openshift-sdn get pod --field-selector "spec.nodeName=ip-10-0-27-9.ec2.internal"
NAME READY STATUS RESTARTS AGE
ovs-dk8bh 1/1 Running 1 52m
sdn-8nl47 1/1 CrashLoopBackoff 3 52m
- Retrieve the logs for the SDN or OVS pod, (respective of which pod (service) is failing):
- Note: Be sure to replace the pod with the proper pod you are debugging.
$oc -n openshift-sdn logs sdn-8nl47
Some common error messages or situations are:
Cannot fetch default cluster network: This means the sdn-controller has failed to run to completion. Retrieve its logs withkubectl -n openshift-sdn logs -l app=sdn-controller.warning: Another process is currently listening on the CNI socket, waiting 15s: Something has gone wrong, and multiple SDN processes are running. SSH to the node in question, capture the out ofps -faux.- If you just need the cluster up, reboot the node.
- Error messages about ovs or OpenVSwitch: Check that the
ovs-*pod on the same node is healthy. Retrieve its logs withoc -n openshift-sdn logs ovs-<name>.- Often rebooting the node should fix the majority of issues with these errors.
- Any indication that the control plane is unavailable: Check to make sure the apiserver is reachable from the node. You may be able to find useful information via
journalctl -f -u kubelet.
-How do I gather the ovs flows to troubleshoot the SDN?
3.x docs
https://docs.openshift.com/container-platform/3.11/admin_guide/sdn_troubleshooting.html#is-the-open-vswitch-ovs-configured-correctly
4.x instructions:
[root@mahoneyrocks emahoney]# oc get pod -n openshift-sdn -l app=ovs
NAME READY STATUS RESTARTS AGE
ovs-25tpk 1/1 Running 0 2d13h
ovs-8wp9c 1/1 Running 0 2d13h
ovs-kg4s7 1/1 Running 0 2d13h
ovs-mz6f6 1/1 Running 0 2d13h
ovs-n49gk 1/1 Running 0 2d13h
[root@mahoneyrocks emahoney]# oc exec -n openshift-sdn ovs-25tpk -- ovs-vsctl list-br
br0
[root@mahoneyrocks emahoney]# oc exec -n openshift-sdn ovs-25tpk -- ovs-ofctl -O OpenFlow13 dump-ports-desc br0
OFPST_PORT_DESC reply (OF1.3) (xid=0x2):
1(vxlan0): addr:56:23:9f:a2:8d:e5
config: 0
state: LIVE
speed: 0 Mbps now, 0 Mbps max
..8<
LOCAL(br0): addr:6a:8f:23:3c:79:44
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
[root@mahoneyrocks emahoney]# oc exec -n openshift-sdn ovs-25tpk -- ovs-ofctl -O OpenFlow13 dump-ports-desc br0 &> /tmp/ports_desc_ovs-25tpk.out
[root@mahoneyrocks emahoney]# oc exec -n openshift-sdn ovs-25tpk -- ovs-ofctl -O OpenFlow13 dump-flows br0 &> /tmp/dump_flows_ovs-25tpk.out
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.