How to renew or recreate a node's certificate in RHOCP4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
Issue
-
One or more nodes are not working, with a "NotReady" status;
-
Kubelet service log shows lots of messages:
Unable to authenticate the request due to an error: x509: certificate signed by unknown authority http: TLS handshake error from 10.173.2.64:43632: no serving certificate available for the kubelet -
Expired or mis-matched node certificates, but there are no Pending CSRs
-
How do I redeploy node certificates or do TLS bootstrapping?
Resolution
Important: This procedure is to renew only one node certificate without recreating the node and it's presupposed that the cluster as a whole is working. If multiple control plane nodes are not working due to expired certificates, please refer to the product documentation or open a support case with Red Hat for further analysis.
-
Make sure you have access to command
ocwith cluster-admin privileges:$ oc whoami system:admin -
Check that there aren't Pending CSRs:
$ oc get csr | grep Pending -
Create a file
recover_kubeconfig.shin your bastion host with the contents:#!/bin/bash set -eou pipefail # context intapi=$(oc get infrastructures.config.openshift.io cluster -o "jsonpath={.status.apiServerInternalURI}") context="$(oc config current-context)" # cluster cluster="$(oc config view -o "jsonpath={.contexts[?(@.name==\"$context\")].context.cluster}")" server="$(oc config view -o "jsonpath={.clusters[?(@.name==\"$cluster\")].cluster.server}")" # token ca_crt_data="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.ca\.crt}" | base64 --decode)" namespace="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.namespace}" | base64 --decode)" token="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.token}" | base64 --decode)" export KUBECONFIG="$(mktemp)" oc config set-credentials "kubelet" --token="$token" >/dev/null ca_crt="$(mktemp)"; echo "$ca_crt_data" > $ca_crt oc config set-cluster $cluster --server="$intapi" --certificate-authority="$ca_crt" --embed-certs >/dev/null oc config set-context kubelet --cluster="$cluster" --user="kubelet" >/dev/null oc config use-context kubelet >/dev/null cat "$KUBECONFIG" -
Run the script to get a new bootstrap node certificate:
$ chmod 755 recover_kubeconfig.sh $ ./recover_kubeconfig.sh > kubeconfig-bootstrap -
SSH into the affected node, stop the
kubeletservice, and backup old node certificates:# systemctl stop kubelet # mkdir -p /root/backup-certs # cp -a /var/lib/kubelet/pki /var/lib/kubelet/kubeconfig /root/backup-certs # rm -rf /var/lib/kubelet/pki /var/lib/kubelet/kubeconfig -
Copy the generated
kubeconfig-bootstrapto/etc/kubernetes/kubeconfig:# cp <kubeconfig-bootstrap> /etc/kubernetes/kubeconfig -
Start the
kubeletservice again:# systemctl start kubelet -
Back in the bastion host, wait a few seconds, check the certificate requests and approve the node bootstrap:
$ oc get csr NAME AGE REQUESTOR CONDITION csr-jsv6z 11s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending $ oc adm certificate approve csr-jsv6z certificatesigningrequest.certificates.k8s.io/csr-jsv6z approvedWait a few more minutes and approve the new node certificate:
$ oc get csr NAME AGE REQUESTOR CONDITION csr-jsv6z 75s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-smqlc 40s system:node:compute-1 Pending $ oc adm certificate approve csr-smqlc certificatesigningrequest.certificates.k8s.io/csr-smqlc approved
After this, the node should go back into Ready state.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.