ETCD pod is restarting frequently.
Environment
- Red Hat OpenShift Container Platform
- 3.10.
- 3.11.
Issue
- The
liveness probefor themaster-etcd podfailed.
rafthttp: the clock difference against peer XXXX is too high [1.46664075s > 1s]
rafthttp: the clock difference against peer XXXX is too high [3.281962067s > 1s]
Liveness probe for master-etcd-master.example.com(XXXX):etcd failed (failure): member XXXX is unhealthy: got unhealthy result from https://ip-address:2379
member XXXX is unhealthy: got unhealthy result from https://ip-address:2379
member XXXX is unhealthy: got unhealthy result from https://ip-address:2379
I/O timeoutserrors for theetcd members.
Resolution
-
The first thing here is to make the
etcdcluster stable without anyclock differencesby syncing theirclocks. -
This can be done in two ways:
1.Enabling This page is not included, but the link has been rewritten to point to the nearest parent document.NTP.
2.Manuallysynctheclockson all themaster nodes.
Root Cause
etcdservers show that theclocksare out of sync with each other which is causingI/O timeouts.- Due to the
I/O timeouts, theliveness probewas failing which made theetcd podto restart frequently.
Diagnostic Steps
- Check the etcd pod logs:
/usr/local/bin/master-logs etcd etcd
- Check the atomic-openshift-node service logs.
SBR
Product(s)
Components
Category
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.