Recovering from expired Openshift-ingress certificates (OCP4.x)

Solution Unverified - Updated

Environment

  • Red Hat OpenShift Cluster Platform 4 (OCP)
  • During login to CLI or console

Issue

  • Login to cluster or OCP cli fails

      x509: certificate has expired or is not yet valid: current time 2022-01-10T19:11:09Z is after 2022-01-09T15:49:17Z
      
      E0110 ... 1 auth.go:235] error contacting auth provider request to OAuth issuer endpoint https://oauth-openshift.apps.my-cluster.com/oauth/token failed: Head "https://oauth-openshift.apps.my-cluster.com": 
    

Resolution

  1. Replace the expired openshift-ingress certificates following the latest documentation

  2. Re-gain access to the cluster using the kubeconfig from installation that you export to a master node first or a recovery kubeconfig:

    Complete the following steps to regain access to the control plane, using the default recovery kubeconfig available on any Master Node:

     $ ssh core@<master0>
     $ sudo -i 
     # export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost-recovery.kubeconfig
     # oc whoami
    

Note: do NOT run oc login after exporting the above recovery kubeconfig, as it will convert the file from a static auth to a token-based auth and you will not be able to continue using it on this master node.

  1. Redeploy the default ingress certificate:

    Create a config map that includes only the root CA certificate used to sign the wildcard certificate:

     $ oc create configmap custom-ca \
         --from-file=ca-bundle.crt=</path/to/example-ca.crt> \
         -n openshift-config
    

    </path/to/example-ca.crt> is the path to the root CA certificate file on your local file system.

  2. Update the cluster-wide proxy configuration with the newly created config map:

     $ oc patch proxy/cluster \
         --type=merge \
         --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}'
    
  3. Create a secret that contains the wildcard certificate chain and key:

     $ oc create secret tls <secret> \
         --cert=</path/to/cert.crt> \
         --key=</path/to/cert.key> \
         -n openshift-ingress
    

    <secret> is the name of the secret that will contain the certificate chain and private key.
    </path/to/cert.crt> is the path to the certificate chain on your local file system.
    </path/to/cert.key> is the path to the private key associated with this certificate.

  4. Update the Ingress Controller configuration with the newly created secret:

     $ oc patch ingresscontroller.operator default \
         --type=merge -p \
         '{"spec":{"defaultCertificate": {"name": "<secret>"}}}' \
         -n openshift-ingress-operator
    

    Replace <secret> with the name used for the secret in the previous step.

Note Once the secret is patched, a rolling deployment will restart downed operators and re-enable console access + allow oc cli login.

Root Cause

OpenShift-ingress custom certificates are an integral part of the security chain with regards to logins and system operations. An expired ingress certificate cannot be bypassed with: oc --insecure-skip-tls-verify=true

See also this Relevant KCS detailing how to re-issue the default ingress certificates (if not using a custom certificate).

Diagnostic Steps

  • Check expiration date of ingress certificates:

      # oc get secrets -n openshift-ingress <your-custom-cert>
      NAME                   TYPE                DATA   AGE
      <your-custom-cert>   kubernetes.io/tls   2      2y1d #<----- (1 day past expiration)
    
  • Validate that you are able to regain access to the control plane with kubeconfig export as detailed in resolution steps above, then check on status of pods in openshift-console:

      `$ oc get pods -n openshift-console`
    
      ```
      NAME                         READY   STATUS    RESTARTS   AGE
      console-f8b8d688f-7pqc9      0/1     Running   13         59m
      console-f8b8d688f-hv4h5      0/1     Running   13         59m
      downloads-6c96776f98-45jfv   1/1     Running   0          3h10m
      downloads-6c96776f98-wjbq6   1/1     Running   0          3h10m
      ```
    
  • Review the logs of the restarting pods to see if you are seeing: x509: certificate has expired or is not yet valid

  • Validate the time on the cluster is correct to confirm this is not a clock skew problem.

Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.