OCP4 - kube-controller-manager timeout is exceeded by DNS lookup timeout caused by offline primary nameserver

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Issue

  • Kube Controller Manager pods fail to have a leader election.
  • Networking is degraded and Kube Controller Manager is in crashloopbackoff state.
  • Configmaps are taking longer than 5s to create but deletion occurs in under a second as expected.
  • The status of the resources do not match reality, and nodes/pods/resources may report READY when in in fact they are offline:
    • After following guidance to restart OVNKube-Node pods as part of a reset of OVN databases, the daemonset reports all pods in READY state, but no pods are running on the corresponding nodes.
    • The nodes status is Ready, but they are not.
  • Deleted pods are not rescheduled

Resolution

  • See this KCS#6995290 for possibility of ValidatingWebhooks failing

  • See Diagnostic steps below to confirm that you are UNABLE to complete a lookup for api-int.<yourcluster>.<yourdomain> in under 5s. If you have 2 nameservers, it is likely that the cURL will succeed, but will complete 5s+ later as it fails over to the secondary nameserver.

  • As a workaround we need to first get the cluster back on it's feet so enter the pod, and create a temporary /etc/hosts entry for api-int.<yourcluster>.<yourdomain> at the target IP address for that IP to skip DNS lookup and allow the pods to schedule:

    oc project openshift-kube-controller-manager
    oc rsh pod/<kube-controller-manager1>
    vi /etc/hosts
    #enter the following on a new-line:
    <IP-of-api-int-address> api-int.<yourcluster>.<yourdomain>
    
  • Repeat this procedure for every kube-controller-manager pod. (on next pod restart they will revert changes back to relying on the default nameserver for the pods and clear this entry)

  • If you cannot reach the pods via oc, ssh to the master nodes and use crictl to make the changes:

    ssh core@<master-node1>
    sudo su #ascend to root
    crictl ps | grep kube-controller-manager
    crictl exec -it <CID-from-left-column-for-kube-controller-manager-container> /bin/bash
    vi /etc/hosts
    #enter the following on a new-line:
    <IP-of-api-int-address> api-int.<yourcluster>.<yourdomain>
    
  • After updating /etc/hosts on these pods, one of the pods will assume leader and your cluster will start to stabilize again. Wait for all CO to return to READY + nodes + pods to schedule.

  • Then, update your nameservers using a supported method below to ensure that your primary nameserver is available to the cluster and prevent the soft-lock behavior (or resolve issues with your upstream DNS platform).

You can use NMState Operator to set DNS directly for the nodes in your cluster. See KCS#7031371 and the official documentation at [About the Kubernetes NMState Operator] and This page is not included, but the link has been rewritten to point to the nearest parent document.Observing and updating the node network state and configuration

**Note - create a unique nmconfig for each node rather than a blanket application for all workers (though this will work). The main reason is that you gain granularity on changes (one node at a time) but also if any of the nodes stall for any reason it won't delay rollout of the rest of the host nodes when you apply the change locally.

You can alternatively use the DNS operator to modify your nameservers by following the official documentation at Using DNS forwarding

You can further optionally modify your /etc/resolv.conf on nodes directly using machine-config updates, or nmcli changes for the primary nics if the nameservers are embedded in these interface definitions:

Lastly you can update the nameservers via DHCP upstream if they are configured via DHCP; this option may involve by necessity restarting the host node after the DHCP lease configuration is modified on the upstream server.

Root Cause

  • DNS timeout takes 5s when the primary nameserver is offline. (This is the maximum ammount of time allocated to a DNS query that hits an unreachable endpoint, before failing-over to the next nameserver in the list.) (maximum wait-time for ack on DNS packet).

  • If the primary nameserver is offline/unreachable, it will take 5s for the call to fail before the next nameserver is tried, which exceeds the 5s maximum wait-time for kube-controller-manager to pull an update on the configmap to determine leader-election + allow pod scheduling.

  • We have submitted a bug to address (possibly) extending this timeout value here: This content is not included.This content is not included.https://issues.redhat.com/browse/OCPBUGS-25879

  • This behavior can be caused by Validating Webhook as outlined in this KCS: https://access.redhat.com/solutions/6995290 as well, so cross-reference if this issue does not match the DNS failure

Diagnostic Steps

  • Check that you can resolve your api-int.<yourcluster>.<your-domain> address from inside your kube-controller-manager pods:
oc rsh pod/<kube-controller-manager>
curl -w 'lookup:       %{time_namelookup}\nconnect:      %{time_connect}\nappconnect:   %{time_appconnect}\npretransfer:  %{time_pretransfer}\nstarttransfer: %{time_starttransfer}\ntotal:        %{time_total}\nhttp_code:    %{http_code}\n' --cert /etc/kubernetes/static-pod-certs/secrets/kube-controller-manager-client-cert-key/tls.crt --key /etc/kubernetes/static-pod-certs/secrets/kube-controller-manager-client-cert-key/tls.key https://api-int.<yourcluster>.<yourdomain>:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager -vk
  • The above result needs to be returned in under 5s for kube-controller manager to update correctly and acquire leader election.

  • Check that all nameservers are online/available (primarily the first nameserver in your /etc/resolv.conf must be online.)

Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.