How to renew or recreate a node's certificate in RHOCP4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Issue

  • One or more nodes are not working, with a "NotReady" status;

  • Kubelet service log shows lots of messages:

    Unable to authenticate the request due to an error: x509: certificate signed by unknown authority
    http: TLS handshake error from 10.173.2.64:43632: no serving certificate available for the kubelet
    
  • Expired or mis-matched node certificates, but there are no Pending CSRs

  • How do I redeploy node certificates or do TLS bootstrapping?

Resolution

Important: This procedure is to renew only one node certificate without recreating the node and it's presupposed that the cluster as a whole is working. If multiple control plane nodes are not working due to expired certificates, please refer to the product documentation or open a support case with Red Hat for further analysis.

  1. Make sure you have access to command oc with cluster-admin privileges:

    $ oc whoami
    system:admin
    
  2. Check that there aren't Pending CSRs:

    $ oc get csr | grep Pending
    
  3. Create a file recover_kubeconfig.sh in your bastion host with the contents:

    #!/bin/bash
    
    set -eou pipefail
    
    # context
    intapi=$(oc get infrastructures.config.openshift.io cluster -o "jsonpath={.status.apiServerInternalURI}")
    context="$(oc config current-context)"
    # cluster
    cluster="$(oc config view -o "jsonpath={.contexts[?(@.name==\"$context\")].context.cluster}")"
    server="$(oc config view -o "jsonpath={.clusters[?(@.name==\"$cluster\")].cluster.server}")"
    # token
    ca_crt_data="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.ca\.crt}" | base64 --decode)"
    namespace="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token  -o "jsonpath={.data.namespace}" | base64 --decode)"
    token="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.token}" | base64 --decode)"
    
    export KUBECONFIG="$(mktemp)"
    oc config set-credentials "kubelet" --token="$token" >/dev/null
    ca_crt="$(mktemp)"; echo "$ca_crt_data" > $ca_crt
    oc config set-cluster $cluster --server="$intapi" --certificate-authority="$ca_crt" --embed-certs >/dev/null
    oc config set-context kubelet --cluster="$cluster" --user="kubelet" >/dev/null
    oc config use-context kubelet >/dev/null
    cat "$KUBECONFIG"
    
  4. Run the script to get a new bootstrap node certificate:

    $ chmod 755 recover_kubeconfig.sh
    $ ./recover_kubeconfig.sh > kubeconfig-bootstrap
    
  5. SSH into the affected node, stop the kubelet service, and backup old node certificates:

    # systemctl stop kubelet
    # mkdir -p /root/backup-certs
    # cp -a /var/lib/kubelet/pki /var/lib/kubelet/kubeconfig /root/backup-certs
    # rm -rf /var/lib/kubelet/pki /var/lib/kubelet/kubeconfig
    
  6. Copy the generated kubeconfig-bootstrap to /etc/kubernetes/kubeconfig:

    # cp <kubeconfig-bootstrap> /etc/kubernetes/kubeconfig
    
  7. Start the kubelet service again:

    # systemctl start kubelet
    
  8. Back in the bastion host, wait a few seconds, check the certificate requests and approve the node bootstrap:

        $ oc get csr
        NAME        AGE   REQUESTOR                                                                   CONDITION
        csr-jsv6z   11s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    
        $ oc adm certificate approve csr-jsv6z
        certificatesigningrequest.certificates.k8s.io/csr-jsv6z approved
    

    Wait a few more minutes and approve the new node certificate:

        $ oc get csr
        NAME        AGE   REQUESTOR                                                                   CONDITION
        csr-jsv6z   75s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
        csr-smqlc   40s   system:node:compute-1                                                       Pending
    
        $ oc adm certificate approve csr-smqlc
        certificatesigningrequest.certificates.k8s.io/csr-smqlc approved
    

After this, the node should go back into Ready state.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.