OpenShift 4 Upgrade hung - Due to kube-apiserver failure

Solution Unverified - Updated

Environment

  • Red Hat OpenShift Container Platform 4.x

Issue

  • I have requested an update to my OpenShift 4.x Cluster, but it is not progressing, this seems to be caused because the kube-apiserver on the same node is not running.

Resolution

Currently, we are tracking this issue in This content is not included.BZ 1713228, however, you can work around this issue by force rescheduling the CVO to a new node.

$ oc delete pod -n openshift-cluster-version -l k8s-app=cluster-version-operator

Root Cause

The CVO uses localhost network to communicate to the kube-apiserver because, during the initial cluster bootstrap, the service network isn't available before it creates networking operator. Localhost networking means that if the 1 of the 3 kube-apiserver fails and the failure coincides with the placement of the CVO (on the same node), CVO never makes progress, nor can it rollback, because it can't talk to the kube-apiserver.

Diagnostic Steps

  • Gather information on the cluster version:

    $ oc get clusterversion
    $ oc get clusterversion -o yaml
    
    • The yaml output will show status messages about your upgrade.
  • Check where the CVO is deployed

    $ oc get pods -n openshift-cluster-version -o wide 
    
  • Check the status of kube-apiservers on the cluster

    $ oc get pods -n openshift-kube-apiserver
    
SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.