Check usage of etcd 3.3 in OpenShift cluster

Updated

OpenShift 3.x has not been qualified by Red Hat to operate using either the etcd 3.3 server or a etcd 3.3 datastore. Due to a mislabelling of artefacts internal to Red Hat, etcd 3.3 was released in RPM and Container Image format for a short time, more information on the incident can be found in the following etcd Incident Response article.

This article is to guide customers on how they can check their OpenShift clusters for the presence of the etcd 3.3 binary in the event that it was installed during maintenance operations whilst it was available. Correctly diagnosing if etcd 3.3 is being used, and consequently if the etcd ‘cluster version’ (this is the version of the datastore) has been upgraded to 3.3 will determine the procedure required to remediate.

The operation to check the etcd cluster can be carried out from any of the OpenShift cluster master nodes. The procedure to remediate will differ dependent on whether all etcd members have been upgraded to 3.3.x, consequently upgrading the etcd cluster version. The following chart outlines the process to diagnose and remediate:

![title="diagnosis flow"](https://access.redhat.com/sites/default/files/images/etcd_flow.png)

Check etcd cluster version

The etcdctl3 command on the OpenShift Master nodes provides the output required to inform what version of etcd is being used on each of the master nodes. Etcd will upgrade its ‘cluster version’ on completion of all etcd members being upgraded to the same version.

The following commands should be used to output a table displaying etcd status:

# ETCD_ALL_ENDPOINTS=`etcdctl3 --write-out=fields member list | awk '/ClientURL/{printf "%s%s",sep,$3; sep=","}'`
# etcdctl3 --endpoints=$ETCD_ALL_ENDPOINTS  endpoint status  --write-out=table

If any rows in resulting table show etcd 3.3.11 in the ‘VERSION’ column, action will need to be taken to remediate. Not remediating the issue will result in the cluster not being in a supported state and future OpenShift updates will prevent the continued operation of the cluster.

If there is no etcd 3.3.11 present in the cluster, the administrator should ensure that neither of the following container images or RPMs are available on any repositories which might have been obtained by mirroring Red Hat hosted content (such as with Satellite):

  • Container Image: registry.access.redhat.com/rhel7/etcd:3.2.22 with Image ID ‘4fd7e8980174’
  • RPM: etcd-3.3.11-2.el7

The following examples demonstrate expected output and the associated article for remediation of a non-compliant cluster.

Example 1

The following output is an example where all etcd cluster members are running version 3.3.11:

# ETCD_ALL_ENDPOINTS=` etcdctl3 --write-out=fields member list | awk '/ClientURL/{printf "%s%s",sep,$3; sep=","}'`
# etcdctl3 --endpoints=$ETCD_ALL_ENDPOINTS  endpoint status  --write-out=table 
+-----------------------------------+------------------+---------+---------+-----------+-----------+------------+
|           ENDPOINT                |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------------------------+------------------+---------+---------+-----------+-----------+------------+
|     https://master1.etcd.com:2379 | d91b1c20df818655 |  3.3.11 |   17 MB |      true |         6 |       7863 |
|           https://10.0.88.33:2379 |  d35cfd2fedc078f |  3.3.11 |   17 MB |     false |         6 |       7863 |
|           https://10.0.88.22:2379 | c9624828ed10ae36 |  3.3.11 |   17 MB |     false |         6 |       7863 |
|           https://10.0.88.11:2379 | d91b1c20df818655 |  3.3.11 |   17 MB |      true |         6 |       7863 |
+-----------------------------------+------------------+---------+---------+-----------+-----------+------------+

Required Action: The etcd datastore needs to be downgraded to 3.2.22 and etcd container images/RPMs need to be replaced on the Master nodes. This procedure will require downtime. The instructions to carry it out are found in OpenShift's etcd cluster version was updated to 3.3.

Example 2

The following output is an example where not all etcd cluster members are running version 3.3.11. In this situation only 1 cluster member is running the 3.3.11 cluster version, however the overall cluster would not have negotiated a cluster version upgrade to 3.3: (the command has been omitted for brevity)

+-------------+------------------+---------+---------+-----------+-----------+------------+
|  ENDPOINT   |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-------------+------------------+---------+---------+-----------+-----------+------------+
| etcd-1:2379 | bf42fc71aea495aa |  3.2.22 |   25 kB |     false |         2 |          8 |
| etcd-2:2379 | 3306e07331346cc3 |  3.2.22 |   25 kB |     false |         2 |          8 |
| etcd-3:2379 | aa81aa5294b3b572 |  3.3.11 |   20 kB |      true |         2 |          8 |
+-------------+------------------+---------+---------+-----------+-----------+------------+

Required Action: The OCP Master nodes running 3.3.11 need to be downgraded to version 3.2.22. Since the cluster version has not been upgraded, the procedure can be carried out without downtime following the instructions in This content is not included.How can I downgrade an etcd member when the cluster version is lower than the etcd member version.

Ensure RPM is not replicated to Satellite repository

Red Hat Satellite installations used to provide package updates to RHEL hosts, such as OCP Master nodes, may have synced the etcd-3.3.11-2.el7 RPM during the short period it was available. The following commands should be used to check if your Satellite is providing this package:

# yum clean metadata
# yum list etcd

Example

The following output is an abbreviated example showing that the Satellite has the etcd-3.3.11-2.el7 package available but it has not yet been installed on the system

# yum clean metadata
Loaded plugins: product-id, search-disabled-repos, subscription-manager
Cleaning repos: [...]
# yum list etcd
Loaded plugins: product-id, search-disabled-repos, subscription-manager
rhel-7-server-extras-rpms                                | 2.0 kB     00:00     
rhel-7-server-rpms                                       | 2.0 kB     00:00     
[...]
Installed Packages
etcd.x86_64               3.2.22-1.el7                @rhel-7-server-extras-rpms
Available Packages
etcd.x86_64               3.3.11-2.el7                rhel-7-server-extras-rpms 

Required Action: The yum configuration on the node needs to be updated to exclude etcd-3.3. This procedure requires no downtime and the instructions can be found in Blocking etcd 3.3 from being installed from Satellite.

Components
Tags
Article Type