How to renew etcd certificates in OpenShift 4.8 and lower when certificates are already expired

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform:
    • 4.5
    • 4.6
    • 4.7
    • 4.8

Issue

  • How to renew the etcd certificates in OpenShift 4.8 and lower when the certificates are already expired?
  • Kube-apiserver shows in the logs "x509 certificate is not valid".

Check the diagnostics steps to verify that the etcd certificates are expired.

Resolution

Prerequisites:

  • Ability to access to all control plane nodes with ssh
  • Root access on the control plane nodes nodes
  • etcd encryption was not enabled (if etcd encryption is enabled, check this solution first).
  • openssl on the host where renew-etcd-certificates.sh script (described below) will be run (already available in RHCOS hosts)
  • Bastion node with ability to copy from/to control plane nodes

If the certificates for the etcd are expired, administrator needs to create them manually. In the process, the etcd-signer and etcd-metric-signer certificates and the keys needs to be obtained either from the API when it is running, or either from the etcd database directly.
Without the signer certificates, the etcd peer and serving certificates can't be re-created. If it is not possible to obtain the signer certificates because etcd is encrypted, check this solution, which explains how to get them so this solution can be applied.

Restore Steps

ATTENTION: Please read through the full article before attempting.

1. Obtain the certificates from database directly (the same certificate will be there twice)

If the Kube Apiserver is down and etcd database is timing out to the etcdctl commands

$ grep -A55 -a "openshift-config/etcd-signer" /var/lib/etcd/member/snap/db | sed -n '/-----BEGIN/,/-----END/ p' | sed 's/^.*-----BEGIN/-----BEGIN/g'

-----BEGIN CERTIFICATE-----
MIIDOTCCAiGgAwIBAgIIH/IRGmbr47YwDQYJKoZIhvcNAQELBQAwKjESMBAGA1UE
CxMJb3BlbnNoaWZ0MRQwEgYDVQQDEwtldGNkLXNpZ25lcjAeFw0yMzA3MTEyMzE5
NThaFw0zMzA3MDgyMzE5NTlaMCoxEjAQBgNVBAsTCW9wZW5zaGlmdDEUMBIGA1UE
...

$ grep -A55 -a "openshift-config/etcd-metric-signer" /var/lib/etcd/member/snap/db | sed -n '/-----BEGIN/,/-----END/ p' | sed 's/^.*-----BEGIN/-----BEGIN/g'| less

Save the signer certificate to etcd-signer.crt and the key to etcd-signer.key. Same for the metric signer, to etcd-metric-signer.crt and etcd-metric-signer.key.

2. Stop the control plane

The services must be stopped on each control plane node before the restore! Use SSH to access each control plane node:

# mkdir -v /etc/kubernetes/manifests-backup/

# mv -v /etc/kubernetes/manifests/* /etc/kubernetes/manifests-backup/

# crictl ps --name '^(etcd|kube-(apiserver|controller-manager|scheduler))$'

If the containers are not automatically stopped, stop them manually:

# crictl stop $(crictl ps --name '^(etcd|kube-(apiserver|controller-manager|scheduler))$' -q)

Ensure no manifests remain in the directory after the mv command and the stop of the containers, and that none of the containers are still running:

# ls /etc/kubernetes/manifests/
# crictl ps --name '^(etcd|kube-(apiserver|controller-manager|scheduler))$'
3. Take a backup copy of etcd keys and certs in all control plane nodes

Use SSH to access the control plane nodes

Make sure to perform the action on each control plane node!

To perform the copy:

# mkdir -v /etc/kubernetes/etcd-certs-backup-$(date +%Y%m%d)

# cp -rvf /etc/kubernetes/static-pod-resources/etcd-certs/secrets/* /etc/kubernetes/etcd-certs-backup-$(date +%Y%m%d)/

Verify the copy:

# diff --recursive --report-identical-files --brief /etc/kubernetes/static-pod-resources/etcd-certs/secrets/ /etc/kubernetes/etcd-certs-backup-$(date +%Y%m%d)/
4. Create the etcd certificates and keys

In this step, we will be using a script that automatically reads the subject and SANs of the existing certificates and generates newer ones with the etcd-signer and etcd-metric-signer obtained in the previous step, leaving them in a new_certificates subfolder

Copy the etcd-signer.{crt,key} and etcd-metric-signer.{crt,key} to the directory .../etcd-all-certs.
Hint: Use cp or vi etcd-signer.crt and copy+paste the certificate to the file.

# cp -v etcd-signer.{crt,key}  /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs

# cp -v etcd-metric-signer.{crt,key}  /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs

Change the current directory:

# cd /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs

Use the information to create the keys and get the certificates signed by the singers that were extracted in the earlier step. Create the following script in the above path:

# vi renew-etcd-certificates.sh

Copy and paste the following script. The script will create all certificates in new directory called new_certificates:

#!/bin/bash

create() {
  # for crating csr instead
  echo $O
  echo $CN
  echo $SAN
  echo $TARGET
  echo $CACRT
  echo $CAKEY
  openssl genrsa -out $TARGET.key 2048
  OPENSSL_CNF=/etc/pki/tls/openssl.cnf
  openssl req -new -sha256 \
    -key $TARGET.key \
    -subj "/O=$O/CN=$CN"  \
    -reqexts SAN \
    -config <(cat ${OPENSSL_CNF} \
        <(printf "\n[SAN]\nsubjectAltName=${SAN}\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment \nextendedKeyUsage=serverAuth, clientAuth")) \
    -out $TARGET.csr

# sign the csr
  openssl x509 \
    -req \
    -sha256 \
    -extfile <(printf "subjectAltName=${SAN}\nbasicConstraints=critical, CA:FALSE\nkeyUsage=digitalSignature, keyEncipherment \nextendedKeyUsage=serverAuth, clientAuth") \
    -days 2000 \
    -in $TARGET.csr \
    -CA $CACRT \
    -CAkey $CAKEY \
    -CAcreateserial -out $TARGET.crt
}
mkdir new_certificates
files=$(ls -1 *.crt|grep -E "etcd-serving|etcd-peer")
for i in $files;
  do name=$i
     O=$(openssl x509 -in $i -noout -subject| awk {'print $3'}|sed s/,//g|tr -d '\n\t\r ')
     CN=$(openssl x509 -in $i -noout -subject| awk {'print $6'}|sed s/,//g|tr -d '\n\t\r ')
     SAN=$(openssl x509 -in $i -noout -noout -ext "subjectAltName"|sed s/'X509v3 Subject Alternative Name:'//g|sed s/Address//g|tr -d '\n\t\r ')
     TARGET=$(echo "new_certificates/$i"|sed s/.crt//g)
     if [[ $i == *"metric"* ]]; then
        CACRT=etcd-metric-signer.crt
        CAKEY=etcd-metric-signer.key
     else
        CACRT=etcd-signer.crt
        CAKEY=etcd-signer.key
     fi
     create
done

Attention: If for any reason this script is trying to be executed in a machine different than the control plane nodes (to which the certificates needs to be copied), please note that it only works on RHEL8 systems (on RHEL7 the command will fail or not work properly, due to the openssl version)

Save the script and execute it:

# chmod +x renew-etcd-certificates.sh

# ./renew-etcd-certificates.sh

Check if the keys and certs are created:

# cd new_certificates

# ls -l ./*.{crt,key}

-rw-------. 1 root root 2490 Jul 22 19:49 etcd-peer-master-0.example.com.crt
-rw-------. 1 root root 1679 Jul 22 19:49 etcd-peer-master-0.example.com.key
-rw-------. 1 root root 2486 Jul 22 19:49 etcd-peer-master-1.example.com.crt
-rw-------. 1 root root 1679 Jul 22 19:49 etcd-peer-master-1.example.com.key
-rw-------. 1 root root 2490 Jul 22 19:49 etcd-peer-master-2.example.com.crt
-rw-------. 1 root root 1679 Jul 22 19:49 etcd-peer-master-2.example.com.key
-rw-------. 1 root root 2750 Jul 22 19:49 etcd-serving-master-0.example.com.crt
-rw-------. 1 root root 1675 Jul 22 19:49 etcd-serving-master-0.example.com.key
-rw-------. 1 root root 2750 Jul 22 19:49 etcd-serving-master-1.example.com.crt
-rw-------. 1 root root 1679 Jul 22 19:49 etcd-serving-master-1.example.com.key
-rw-------. 1 root root 2750 Jul 22 19:49 etcd-serving-master-2.example.com.crt
-rw-------. 1 root root 1679 Jul 22 19:49 etcd-serving-master-2.example.com.key
-rw-------. 1 root root 2778 Jul 22 19:49 etcd-serving-metrics-master-0.example.com.crt
-rw-------. 1 root root 1679 Jul 22 19:49 etcd-serving-metrics-master-0.example.com.key
-rw-------. 1 root root 2774 Jul 22 19:49 etcd-serving-metrics-master-1.example.com.crt
-rw-------. 1 root root 1675 Jul 22 19:49 etcd-serving-metrics-master-1.example.com.key
-rw-------. 1 root root 2778 Jul 22 19:49 etcd-serving-metrics-master-2.example.com.crt
-rw-------. 1 root root 1675 Jul 22 19:49 etcd-serving-metrics-master-2.example.com.key

Verify that the certificates are valid:

# find . -iname "*.crt" -print -exec openssl x509 -dates -noout -in  {} \;
5. Compress the files and copy the archive to all the control plane nodes
# tar -cvzf etcd-all-certs.tar.gz ./*{.crt,.key}

// Verify that the files are in the archive
# tar --list -f etcd-all-certs.tar.gz

# scp -i [identity_file] etcd-all-certs.tar.gz core@masterN:/tmp
6. Overwrite the created key and cert files to each control plane node

For each control plane node, execute the following:
List the directories. If any directory is missing, create it:

# ls -l /etc/kubernetes/static-pod-resources/etcd-certs/secrets/

# mkdir -v /etc/kubernetes/static-pod-resources/etcd-certs/secrets/{etcd-all-certs,etcd-all-peer,etcd-all-serving,etcd-all-serving-metrics}

Remove all the certs:

# rm -v /etc/kubernetes/static-pod-resources/etcd-certs/secrets/{etcd-all-certs,etcd-all-peer,etcd-all-serving,etcd-all-serving-metrics}/*.{crt,key}

Extract the archive and copy all certs to the correct location:

# cd /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs

# tar -xvf /tmp/etcd-all-certs.tar.gz -C /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs/

Copy the certificates to the directories {etcd-all-peer, etcd-all-serving, etcd-all-serving-metrics}:

# cd /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs/

# find /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs/ -maxdepth 1 -iname 'etcd-peer-*' -exec cp -v {} /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-peer/ \;

# find /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs/ -maxdepth 1 -iname 'etcd-serving-*' ! -iname "etcd-serving-metrics*" -exec cp -v {} /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-serving/ \;

# find /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs/ -maxdepth 1 -iname 'etcd-serving-metrics*' -exec cp -v {} /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-serving-metrics/ \;

List all the certificates (verification step):

# find /etc/kubernetes/static-pod-resources/etcd-certs/secrets/ -iname "*.crt" -print -exec openssl x509 -dates -noout -in  {} \;
7. Start the etcd database in all control plane nodes (one by one)

To start the etcd pod, move the manifest to /etc/kubernetes/manifests/:

# mv -v /etc/kubernetes/manifests-backup/etcd-pod.yaml /etc/kubernetes/manifests/

Run Control+C to quit the command, if containers are running:

# watch 'crictl ps --name etcd' 

The command below should print the status of all etcd peers:

# crictl exec $(crictl ps --name etcdctl -q) etcdctl endpoint status -w table 
8. If the etcd started correctly, start the Kube Apiserver, Kube Controller and Kube Scheduler in all control plane nodes
# mv -v /etc/kubernetes/manifests-backup/* /etc/kubernetes/manifests/

# watch "crictl ps --name '^(etcd|kube-(apiserver|controller-manager|scheduler))$'"

Follow Up

If the OpenShift started correctly and the oc commands are working again, the secrets must be refreshed as they still hold old certificates.

To refresh the certificates, please follow the Solution linked below.
How to renew etcd certificates in OpenShift 4.8 when certificates are not expired

Root Cause

The etcd certificates in OpenShift 4.6, 4.7 and 4.8 are not automatically rotated. They will expire after 3 years.

The issue is fixed from OpenShift version 4.9 and higher, as automatic rotation of the etcd certificates is implemented.

Diagnostic Steps

To verify if the etcd certificates are expired, please run following commands.

SSH to the control plane nodes
# ssh core@masterN

# find /etc/kubernetes/static-pod-resources/etcd-certs/secrets -iname "etcd*.crt" -exec openssl x509 -noout -enddate -in {} \;

notAfter=Jul 21 19:46:38 2026 GMT
notAfter=Jul 21 19:46:38 2026 GMT
notAfter=Jul 21 19:46:38 2026 GMT
notAfter=Jul 21 19:46:37 2026 GMT
notAfter=Jul 21 19:46:38 2026 GMT
notAfter=Jul 21 19:46:39 2026 GMT
...

Additionally, if the etcd is still up, check against each member:

# for each in $(grep ETCDCTL_ENDPOINTS /etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-scripts/etcd.env | sed 's/^.*="//g;s/"//g;s/,/ /g'); do echo "etcd member $each"; echo -n |openssl s_client -connect $(echo $each | sed 's/https:\/\///g') -showcerts 2>/dev/null|openssl x509 -noout -dates; echo "----"; done

etcd member https://10.0.1.1:2379
notBefore=Jul 22 19:46:38 2023 GMT
notAfter=Jul 21 19:46:39 2026 GMT
----
etcd member https://10.0.1.2:2379
notBefore=Jul 22 19:46:37 2023 GMT
notAfter=Jul 21 19:46:38 2026 GMT
----
etcd member https://10.0.1.3:2379
notBefore=Jul 22 19:46:37 2023 GMT
notAfter=Jul 21 19:46:38 2026 GMT
----
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.