Troubleshooting the OpenShift Container Platform 4.x: cluster-network-operator
Environment
- Red Hat OpenShift Container Platform
- 4.x
Issue
- I don't seem to have a functioning SDN with my newly created cluster, what data should I collect to investigate the issue?
Root Cause
The cluster network operator is responsible for deploying the networking components. It does this in response to a special object created by the installer.
From a deployment perspective, the network operator is often the "canary in the coal mine." It runs very early in the installation process, after the master nodes have come up but before the bootstrap control plane has been torn down. It can be indicative of more subtle installer issues, such as long delays in bringing up master nodes or apiserver communication issues. Nevertheless, it can have other bugs.
Diagnostic Steps
-
Determine that the network configuration exists:
$ oc get network -o yaml cluster- If it doesn't exist, the installer didn't create it. You'll have to run openshift-install create manifests to determine why.
-
Check that the network-operator is running:
$ oc get po -n openshift-network-operator -
Retrieve the logs from the operator pods
- Note: on multi-master systems, the operator will perform leader election and all other operators will sleep
- Expect the logs from 1 or more pods to not be very chatty or verbose.
$ oc -n openshift-network-operator logs deployment.apps/network-operator - Note: on multi-master systems, the operator will perform leader election and all other operators will sleep
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.