OCP4 - kube-controller-manager timeout is exceeded by DNS lookup timeout caused by offline primary nameserver

Solution Verified - Updated 17 May 2024

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 4

Issue

Kube Controller Manager pods fail to have a leader election.
Networking is degraded and Kube Controller Manager is in crashloopbackoff state.
Configmaps are taking longer than 5s to create but deletion occurs in under a second as expected.
The status of the resources do not match reality, and nodes/pods/resources may report READY when in in fact they are offline:
- After following guidance to restart OVNKube-Node pods as part of a reset of OVN databases, the daemonset reports all pods in READY state, but no pods are running on the corresponding nodes.
- The nodes status is Ready, but they are not.
Deleted pods are not rescheduled

Resolution

See this KCS#6995290 for possibility of ValidatingWebhooks failing
See Diagnostic steps below to confirm that you are UNABLE to complete a lookup for api-int.<yourcluster>.<yourdomain> in under 5s. If you have 2 nameservers, it is likely that the cURL will succeed, but will complete 5s+ later as it fails over to the secondary nameserver.
As a workaround we need to first get the cluster back on it's feet so enter the pod, and create a temporary /etc/hosts entry for api-int.<yourcluster>.<yourdomain> at the target IP address for that IP to skip DNS lookup and allow the pods to schedule:
```
oc project openshift-kube-controller-manager
oc rsh pod/<kube-controller-manager1>
vi /etc/hosts
#enter the following on a new-line:
<IP-of-api-int-address> api-int.<yourcluster>.<yourdomain>
```
Repeat this procedure for every kube-controller-manager pod. (on next pod restart they will revert changes back to relying on the default nameserver for the pods and clear this entry)

If you cannot reach the pods via oc, ssh to the master nodes and use crictl to make the changes:

ssh core@<master-node1>
sudo su #ascend to root
crictl ps | grep kube-controller-manager
crictl exec -it <CID-from-left-column-for-kube-controller-manager-container> /bin/bash
vi /etc/hosts
#enter the following on a new-line:
<IP-of-api-int-address> api-int.<yourcluster>.<yourdomain>

After updating /etc/hosts on these pods, one of the pods will assume leader and your cluster will start to stabilize again. Wait for all CO to return to READY + nodes + pods to schedule.
Then, update your nameservers using a supported method below to ensure that your primary nameserver is available to the cluster and prevent the soft-lock behavior (or resolve issues with your upstream DNS platform).

You can use NMState Operator to set DNS directly for the nodes in your cluster. See KCS#7031371 and the official documentation at [About the Kubernetes NMState Operator] and This page is not included, but the link has been rewritten to point to the nearest parent document.Observing and updating the node network state and configuration

**Note - create a unique nmconfig for each node rather than a blanket application for all workers (though this will work). The main reason is that you gain granularity on changes (one node at a time) but also if any of the nodes stall for any reason it won't delay rollout of the rest of the host nodes when you apply the change locally.

You can alternatively use the DNS operator to modify your nameservers by following the official documentation at Using DNS forwarding

You can further optionally modify your /etc/resolv.conf on nodes directly using machine-config updates, or nmcli changes for the primary nics if the nameservers are embedded in these interface definitions:

Lastly you can update the nameservers via DHCP upstream if they are configured via DHCP; this option may involve by necessity restarting the host node after the DHCP lease configuration is modified on the upstream server.

Root Cause

DNS timeout takes 5s when the primary nameserver is offline. (This is the maximum ammount of time allocated to a DNS query that hits an unreachable endpoint, before failing-over to the next nameserver in the list.) (maximum wait-time for ack on DNS packet).
If the primary nameserver is offline/unreachable, it will take 5s for the call to fail before the next nameserver is tried, which exceeds the 5s maximum wait-time for kube-controller-manager to pull an update on the configmap to determine leader-election + allow pod scheduling.
We have submitted a bug to address (possibly) extending this timeout value here: This content is not included.This content is not included.https://issues.redhat.com/browse/OCPBUGS-25879
This behavior can be caused by Validating Webhook as outlined in this KCS: https://access.redhat.com/solutions/6995290 as well, so cross-reference if this issue does not match the DNS failure

Diagnostic Steps

Check that you can resolve your api-int.<yourcluster>.<your-domain> address from inside your kube-controller-manager pods:

oc rsh pod/<kube-controller-manager>
curl -w 'lookup:       %{time_namelookup}\nconnect:      %{time_connect}\nappconnect:   %{time_appconnect}\npretransfer:  %{time_pretransfer}\nstarttransfer: %{time_starttransfer}\ntotal:        %{time_total}\nhttp_code:    %{http_code}\n' --cert /etc/kubernetes/static-pod-certs/secrets/kube-controller-manager-client-cert-key/tls.crt --key /etc/kubernetes/static-pod-certs/secrets/kube-controller-manager-client-cert-key/tls.key https://api-int.<yourcluster>.<yourdomain>:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager -vk

The above result needs to be returned in under 5s for kube-controller manager to update correctly and acquire leader election.
Check that all nameservers are online/available (primarily the first nameserver in your /etc/resolv.conf must be online.)

SBR

Shift Networking

Product(s)

Red Hat OpenShift Container Platform

Components

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.