EtcdCertSignerControllerDegraded error on etcd operator
Environment
- Red Hat OpenShift Container Platform 4.x.
Issue
- Following certificate error is returned when executing
oc describe co etcd.- The same error message shows up multiple times, at least once for every etcd member.
- Server names and IP addresses have been hidden.
Message: EtcdCertSignerControllerDegraded: [SAN for the certificate <etcd_member_name> does not include <IP>: x509: certificate is valid for <IP>, not <IP>, SAN for the certificate [...]
Reason: EtcdCertSignerController_Error
Status: True
Type: Degraded
Resolution
- First remove all the secrets starting with
etcd-serving-metrics-,etcd-servingandetcd-peerin the projectopenshift-etcd. You can list them by executing the command below:
$ oc get secret | egrep 'etcd-serving-metrics|etcd-peer'
etcd-peer-ip-10-0-150-219.eu-central-1.compute.internal kubernetes.io/tls 2 19s
etcd-peer-ip-10-0-163-28.eu-central-1.compute.internal kubernetes.io/tls 2 18s
etcd-peer-ip-10-0-211-121.eu-central-1.compute.internal kubernetes.io/tls 2 18s
etcd-serving-metrics-ip-10-0-150-219.eu-central-1.compute.internal kubernetes.io/tls 2 19s
etcd-serving-metrics-ip-10-0-163-28.eu-central-1.compute.internal kubernetes.io/tls 2 18s
etcd-serving-metrics-ip-10-0-211-121.eu-central-1.compute.internal kubernetes.io/tls 2 18s
etcd-serving-ip-10-0-150-219.eu-central-1.compute.internal kubernetes.io/tls 2 19s
etcd-serving-ip-10-0-163-28.eu-central-1.compute.internal kubernetes.io/tls 2 18s
etcd-serving-ip-10-0-211-121.eu-central-1.compute.internal kubernetes.io/tls 2 18s
- You can check the certificate data by running the following (example for one certificate where OCP has IPv4, IPv6 dual stack):
$ oc get secret etcd-serving-ip-10-0-163-28.eu-central-1.compute.internal -o json | jq '.data."tls.crt" | @base64d' |sed -e's/\\n/\n/g' |sed -e 's/"//g' | openssl x509 -noout -text | grep -i Alt -A1
X509v3 Subject Alternative Name:
DNS:etcd.kube-system.svc, DNS:etcd.kube-system.svc.cluster.local, DNS:etcd.openshift-etcd.svc, DNS:etcd.openshift-etcd.svc.cluster.local, DNS:localhost, DNS:::1, DNS:10.0.163.28, DNS:127.0.0.1, DNS:2d00:9a00:6000:30c::28:343e, DNS:::1, IP Address:0:0:0:0:0:0:0:1, IP Address:10.0.163.28, IP Address:127.0.0.1, IP Address:2D00:9A00:6000:30C::28:343E, IP Address:0:0:0:0:0:0:0:1
-
Remove the only secrets which do not have valid IPs for the node in the certificate Alternative Name.
-
To remove them, you can execute
oc delete secret <secret_name>. -
Then Content from etcd.io is not included.update advertised peer URLs to reflect the new IP addresses.
-
In case master nodes have 2 NIC's configured and issue is occurring due to 2nd NIC(ip) even after updating peer url, a workaround is to disable secondary NIC on nodes to proceed with upgrade
-
In case the steps described do not solve the problem for you or you find any other issue, please contact Red Hat Support in order to investigate the problem further.
Root Cause
The problem has two possible root causes:
- The IP addresses of the master nodes are not persistent and they changed after a reboot. This can happen, for example, during an upgrade.
- Make sure that the IP addresses of your masters are persistent and never change even if they are rebooted. If your cluster is in AWS or Azure, you can check this with your cloud provider.
- A second IP address was configured in the master nodes after the cluster was installed.
- This is a known bug. At the time when this KCS article is being written (latest version: 4.7), Red Hat Engineering is working to fix it. Please, contact Red Hat Support for more information.
In case you have experienced this problem, but none of the circumstances described are applicable; please contact Red Hat Support and report your problem.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.