How to renew etcd certificates in OpenShift 4.8 and lower when certificates are already expired and etcd is encrypted
Environment
-
Red Hat OpenShift Container Platform
- 4.6
- 4.7
- 4.8
-
Etcd is encrypted
Issue
- How to renew the
etcdcertificates in OpenShift 4.8 and lower when the certificates are already expired? - Kube-apiserver shows in the logs "x509 certificate is not valid".
Check the diagnostics steps of this solution to verify that the etcd certificates are expired.
Resolution
- Stop all the static pods on all masters (like in the original KCS instructions)
Perform the following steps on each master node:
$ mkdir -v /etc/kubernetes/manifests-backup/
$ mv /etc/kubernetes/manifests/* /etc/kubernetes/manifests-backup/
$ crictl ps | grep -e "etcd\|kube-apiserver\|kube-controller\|kube-scheduler" # Wait until this output does not show any running container that doesn't belong to an operator
- Create a self-signed "fake signer"
openssl req -newkey rsa:4096 \
-x509 \
-sha512 \
-days 3650 \
-nodes \
-out fake-signer.crt \
-keyout fake-signer.key \
-subj "/O=EtcdRestoring/OU=EtcdRestoring/CN=etcd-restore-fake-ca"
- Gather all the subjectAltNames like in the original procedure, by running this command on each expired certificate:
openssl x509 -noout -ext "subjectAltName" -in ${EXPIRED_CERT} | grep -v X509v3 | sed 's/^ *//;s/ Address//g'
This can be done at any time
- Create a
fake-etcd-certificates.shscript (similar to therenew-etcd-certificates.shfrom the solution for non-encrypted etcd but that uses our fake signer). In the example, it would look like this
NOTE: You must complete this script with your node fqdns in place of etcd-peer-master-1.example.com and node IPs in place of 192.168.24.243. You must also copy the section for master1 and complete the steps for masters 2 and 3 as well
#!/bin/bash
create() {
# for crating csr instead
openssl genrsa -out $TARGET.key 2048
OPENSSL_CNF=/etc/pki/tls/openssl.cnf
openssl req -new -sha256 \
-key $TARGET.key \
-subj "/O=$O/CN=$CN" \
-reqexts SAN \
-config <(cat ${OPENSSL_CNF} \
<(printf "\n[SAN]\nsubjectAltName=${SAN}\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment \nextendedKeyUsage=serverAuth, clientAuth")) \
-out $TARGET.csr
# sign the csr
openssl x509 \
-req \
-sha256 \
-extfile <(printf "subjectAltName=${SAN}\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment \nextendedKeyUsage=serverAuth, clientAuth") \
-days 2000 \
-in $TARGET.csr \
-CA $CACRT \
-CAkey $CAKEY \
-CAcreateserial -out $TARGET.crt
}
# Replace example FQDN etcd-peer-master-1.example.com and IP 192.168.24.243 to match your environment
# For master node 1
TARGET="etcd-peer-master-1.example.com"
O="system:etcd-peers"
CN="system:etcd-peer:etcd-client"
SAN="DNS:localhost, DNS:192.168.24.243, IP:192.168.24.243"
CACRT=fake-signer.crt
CAKEY=fake-signer.key
create
TARGET="etcd-serving-master-1.example.com"
O="system:etcd-servers"
CN="system:etcd-server:etcd-client"
SAN="DNS:etcd.kube-system.svc, DNS:etcd.kube-system.svc.cluster.local, DNS:etcd.openshift-etcd.svc, DNS:etcd.openshift-etcd.svc.cluster.local, DNS:localhost, DNS:::1, DNS:127.0.0.1, DNS:192.168.24.243, DNS:::1, IP:0:0:0:0:0:0:0:1, IP:127.0.0.1, IP:192.168.24.243, IP:0:0:0:0:0:0:0:1"
CACRT=fake-signer.crt
CAKEY=fake-signer.key
create
TARGET="etcd-serving-metrics-master-1.example.com"
O="system:etcd-metrics"
CN="system:etcd-metric:etcd-client"
SAN="DNS:etcd.kube-system.svc, DNS:etcd.kube-system.svc.cluster.local, DNS:etcd.openshift-etcd.svc, DNS:etcd.openshift-etcd.svc.cluster.local, DNS:localhost, DNS:::1, DNS:127.0.0.1, DNS:192.168.24.243, DNS:::1, IP:0:0:0:0:0:0:0:1, IP:127.0.0.1, IP:192.168.24.243, IP:0:0:0:0:0:0:0:1"
CACRT=fake-signer.crt
CAKEY=fake-signer.key
create
# NOTE
# Copy the entire master 1 section and use it to fill out the correct details for the other two nodes below:
# For master node 2
# For master node 3
WARNING: This script is not ready to run. You must complete the sections for master 2 and 3, and fill out IP and FQDN for each node
-
Copy the resulting files to
/etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs/on all masters -
Get the kube-apiserver revision and write it down for the next step (ideally, it should be the same in all control plane nodes, but if the cluster was in the middle of a kube-apiserver rollout, it might be different on each one, so watch out):
$ jq -r .metadata.labels.revision /etc/kubernetes/manifests-backup/kube-apiserver-pod.yaml
- Edit the following files to append the contents of
fake-signer.crtat their end:
/etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-serving-ca/ca-bundle.crt/etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-peer-client-ca/ca-bundle.crt/etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-metrics-proxy-serving-ca/ca-bundle.crt/etc/kubernetes/static-pod-resources/kube-apiserver-pod-${KUBE-APISERVER-REVISION}/configmaps/etcd-serving-ca/ca-bundle.crt(where${KUBE-APISERVER-REVISION}must be replaced by the kube-apiserver revision obtained in the previous step).
- Start etcd cluster only
$ mv /etc/kubernetes/manifests-backup/etcd-pod.yaml /etc/kubernetes/manifests
- Check that the containers started successfully and the etcd cluster is now working
$ crictl ps --name etcd
$ crictl exec $(crictl ps --name etcdctl -q) etcdctl endpoint status -w table
- Start kube-apiserver only:
$ mv /etc/kubernetes/manifests-backup/kube-apiserver-pod.yaml /etc/kubernetes/manifests
- Check if kube-apiserver started (container should be running for at least 1 minute)
$ crictl ps --name kube-apiserver
- Export the
localhost-recovery.kubeconfigand obtain the real etcd-signer with it
export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost-recovery.kubeconfig
oc get secret etcd-signer -n openshift-config -o jsonpath="{.data.tls\.crt}" | base64 -d > etcd-signer.crt
oc get secret etcd-signer -n openshift-config -o jsonpath="{.data.tls\.key}" | base64 -d > etcd-signer.key
oc get secret etcd-metric-signer -n openshift-config -o jsonpath="{.data.tls\.crt}" | base64 -d > etcd-metric-signer.crt
oc get secret etcd-metric-signer -n openshift-config -o jsonpath="{.data.tls\.key}" | base64 -d > etcd-metric-signer.key
-
Stop the control plane again.
-
Apply the How to renew etcd certificates in OpenShift 4.8 when certificates are already expired solution using the
etcd-signer.crt,etcd-signer.key,etcd-metric-signer.crtandetcd-metric-signer.keythat we have obtained here.
Root Cause
The etcd certificates in OpenShift 4.6, 4.7 and 4.8 are not automatically rotated.
They will expire after 3 years.
The issue is fixed from OpenShift version 4.9 and higher, as automatic rotation of the etcd certificates is implemented.
In order to generate new etcd certificates as per the usual procedure, we need the etcd signer. However, if etcd is encrypted, we cannot just extract it from backup.
This solution first starts etcd temporarily with certificates signed with a fake signer, so it can start up and kube-apiserver can access it. Then we retrieve the real etcd signer and apply the usual procedure.
The control plane with fake etcd CA should be up for as short as possible, because other cluster components may not acknowledge this CA and/or eventually swap it out from places we had to place it in order for this to work. This is why this procedure has to be used only to gather the etcd-signer required for "How to renew etcd certificates in OpenShift 4.8 and lower when certificates are already expired" solution and we shall not directly try "How to renew etcd certificates in OpenShift 4.8 when certificates are not expired" solution with the fake CA in place.
Diagnostic Steps
Same than "How to renew etcd certificates in OpenShift 4.8 and lower when certificates are already expired" solution. The difference with this solution is that this solution must be run first in clusters with expired certificates.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.