etcd pod is failing to start after updating to OpenShift Container Platform 4.9.28 or 4.10.9

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (OCP)
    • 4.9.28 or later
    • 4.10.9 or later

Issue

  • After updating to OCP 4.9.28 or 4.10.9 (or later) one etcd pod is failing to start and the etcd operator is in a degraded state.
  • etcd pod is failing to start and reporting found data inconsistency with peers.

Resolution

Root Cause

  • With Red Hat OpenShift Container Platform 4.9.28 and 4.10.9, Red Hat introduces the --experimental-initial-corrupt-check=true flag for etcd to detect etcd members that may have a corrupt etcd database.
    • The --experimental-initial-corrupt-check=true flag may prevent problematic etcd members from starting and will trigger found data inconsistency with peers messages to be reported in it's logs. It will also set the etcd Cluster Operator to degraded state because of the faulty etcd member.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.