OCP4 - kube-controller-manager timeout is exceeded by DNS lookup timeout caused by offline primary nameserver
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
Issue
Kube Controller Managerpods fail to have a leader election.- Networking is degraded and
Kube Controller Manageris incrashloopbackoffstate. - Configmaps are taking longer than 5s to create but deletion occurs in under a second as expected.
- The status of the resources do not match reality, and nodes/pods/resources may report READY when in in fact they are offline:
- After following guidance to restart
OVNKube-Nodepods as part of a reset ofOVNdatabases, thedaemonsetreports all pods inREADYstate, but no pods are running on the corresponding nodes. - The nodes status is
Ready, but they are not.
- After following guidance to restart
- Deleted pods are not rescheduled
Resolution
-
See this KCS#6995290 for possibility of
ValidatingWebhooksfailing -
See Diagnostic steps below to confirm that you are UNABLE to complete a lookup for
api-int.<yourcluster>.<yourdomain>in under 5s. If you have 2 nameservers, it is likely that thecURLwill succeed, but will complete 5s+ later as it fails over to the secondary nameserver. -
As a workaround we need to first get the cluster back on it's feet so enter the pod, and create a temporary
/etc/hostsentry forapi-int.<yourcluster>.<yourdomain>at the target IP address for that IP to skip DNS lookup and allow the pods to schedule:oc project openshift-kube-controller-manager oc rsh pod/<kube-controller-manager1> vi /etc/hosts #enter the following on a new-line: <IP-of-api-int-address> api-int.<yourcluster>.<yourdomain> -
Repeat this procedure for every
kube-controller-managerpod. (on next pod restart they will revert changes back to relying on the default nameserver for the pods and clear this entry) -
If you cannot reach the pods via
oc, ssh to the master nodes and use crictl to make the changes:ssh core@<master-node1> sudo su #ascend to root crictl ps | grep kube-controller-manager crictl exec -it <CID-from-left-column-for-kube-controller-manager-container> /bin/bash vi /etc/hosts #enter the following on a new-line: <IP-of-api-int-address> api-int.<yourcluster>.<yourdomain> -
After updating
/etc/hostson these pods, one of the pods will assume leader and your cluster will start to stabilize again. Wait for all CO to return to READY + nodes + pods to schedule. -
Then, update your nameservers using a supported method below to ensure that your primary nameserver is available to the cluster and prevent the soft-lock behavior (or resolve issues with your upstream DNS platform).
You can use NMState Operator to set DNS directly for the nodes in your cluster. See KCS#7031371 and the official documentation at [About the Kubernetes NMState Operator] and This page is not included, but the link has been rewritten to point to the nearest parent document.Observing and updating the node network state and configuration
**Note - create a unique nmconfig for each node rather than a blanket application for all workers (though this will work). The main reason is that you gain granularity on changes (one node at a time) but also if any of the nodes stall for any reason it won't delay rollout of the rest of the host nodes when you apply the change locally.
You can alternatively use the DNS operator to modify your nameservers by following the official documentation at Using DNS forwarding
You can further optionally modify your /etc/resolv.conf on nodes directly using machine-config updates, or nmcli changes for the primary nics if the nameservers are embedded in these interface definitions:
- Modifying DNS settings in OpenShift 4 nodes via MachineConfig
- Modifying DNS settings in OpenShift 4 nodes with OVN via MachineConfig
Lastly you can update the nameservers via DHCP upstream if they are configured via DHCP; this option may involve by necessity restarting the host node after the DHCP lease configuration is modified on the upstream server.
Root Cause
-
DNS timeout takes 5s when the primary nameserver is offline. (This is the maximum ammount of time allocated to a DNS query that hits an unreachable endpoint, before failing-over to the next nameserver in the list.) (maximum wait-time for ack on DNS packet).
-
If the primary nameserver is offline/unreachable, it will take 5s for the call to fail before the next nameserver is tried, which exceeds the 5s maximum wait-time for kube-controller-manager to pull an update on the configmap to determine leader-election + allow pod scheduling.
-
We have submitted a bug to address (possibly) extending this timeout value here: This content is not included.This content is not included.https://issues.redhat.com/browse/OCPBUGS-25879
-
This behavior can be caused by Validating Webhook as outlined in this KCS: https://access.redhat.com/solutions/6995290 as well, so cross-reference if this issue does not match the DNS failure
Diagnostic Steps
- Check that you can resolve your
api-int.<yourcluster>.<your-domain>address from inside your kube-controller-manager pods:
oc rsh pod/<kube-controller-manager>
curl -w 'lookup: %{time_namelookup}\nconnect: %{time_connect}\nappconnect: %{time_appconnect}\npretransfer: %{time_pretransfer}\nstarttransfer: %{time_starttransfer}\ntotal: %{time_total}\nhttp_code: %{http_code}\n' --cert /etc/kubernetes/static-pod-certs/secrets/kube-controller-manager-client-cert-key/tls.crt --key /etc/kubernetes/static-pod-certs/secrets/kube-controller-manager-client-cert-key/tls.key https://api-int.<yourcluster>.<yourdomain>:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager -vk
-
The above result needs to be returned in under 5s for kube-controller manager to update correctly and acquire leader election.
-
Check that all nameservers are online/available (primarily the first nameserver in your
/etc/resolv.confmust be online.)
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.