ETCD pod is restarting frequently.

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform
    • 3.10.
    • 3.11.

Issue

  • The liveness probe for the master-etcd pod failed.
rafthttp: the clock difference against peer XXXX is too high [1.46664075s > 1s]
rafthttp: the clock difference against peer XXXX is too high [3.281962067s > 1s]

Liveness probe for master-etcd-master.example.com(XXXX):etcd failed (failure): member XXXX is unhealthy: got unhealthy result from https://ip-address:2379

member XXXX is unhealthy: got unhealthy result from https://ip-address:2379
member XXXX is unhealthy: got unhealthy result from https://ip-address:2379
  • I/O timeouts errors for the etcd members.

Resolution

Root Cause

  • etcd servers show that the clocks are out of sync with each other which is causing I/O timeouts.
  • Due to the I/O timeouts, the liveness probe was failing which made the etcd pod to restart frequently.

Diagnostic Steps

  • Check the etcd pod logs:
/usr/local/bin/master-logs etcd etcd
  • Check the atomic-openshift-node service logs.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.