rafthttp: the clock difference against peer is too high in OpenShift etcd

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • etcd

Issue

  • Messages like the following ones are shown in etcd logs:

    W | rafthttp: the clock difference against peer xxxxxxxxxxxxxxxx is too high [4m18.466926704s > 1s]
    W | rafthttp: the clock difference against peer xxxxxxxxxxxxxxxx is too high [4m18.463381838s > 1s]
    
  • The liveness probes for the etcd pods fails.

Resolution

It is possible to use NTP servers to get the clocks for all the nodes syncrhonized, as explained in configuring NTP/chrony in Openshift 4.

If Chrony is configured but not properly synchronizing, refer to chronyd is not synchronizing with NTP server and troubleshooting the NTP Chrony time service in Red Hat OpenShift Container Platform.

Root Cause

There is a time difference between the master nodes causing the issue.

Diagnostic Steps

  • Check logs of the etcd pods:

        $ oc logs etcd-master1.ocp.example.com -n openshift-etcd -c etcd | grep "the clock difference against peer"
    
        2021-09-24T06:39:16.408674158Z 2021-09-24 06:39:16.408617 W | rafthttp: the clock difference against peer xxxxxxxxxxxxxxxx is too high [4m18.466926704s > 1s]
        2021-09-24T06:39:16.465279570Z 2021-09-24 06:39:16.465225 W | rafthttp: the clock difference against peer xxxxxxxxxxxxxxxx is too high [4m18.463381838s > 1s]
    
  • Check the time on each master node:

        $ for NODE in $(oc get nodes -l node-role.kubernetes.io/control-plane= -o name); do echo "-------------- $NODE ------------"; oc debug -q ${NODE} -- chroot /host bash -c "hostname; echo; timedatectl"; echo; done
        
        -------------- node/master-0.openshift.example.com ------------
        master-0.openshift.example.com
        
                       Local time: Fri 2021-09-24 13:55:21 UTC
                   Universal time: Fri 2021-09-24 13:55:21 UTC
                         RTC time: Fri 2021-09-24 13:55:38
                        Time zone: UTC (UTC, +0000)
        System clock synchronized: no
                      NTP service: active
                  RTC in local TZ: no
    
        -------------- node/master-1.openshift.example.com ------------
        master-1.openshift.example.com
    
                       Local time: Fri 2021-09-24 13:55:43 UTC
                   Universal time: Fri 2021-09-24 13:55:43 UTC
                         RTC time: Fri 2021-09-24 13:56:01
                        Time zone: UTC (UTC, +0000)
        System clock synchronized: no
                      NTP service: active
                  RTC in local TZ: no
    
        -------------- node/master-2.openshift.example.com ------------
        master-2.openshift.example.com
    
                       Local time: Fri 2021-09-24 13:52:04 UTC                       ### high time difference with other nodes
                   Universal time: Fri 2021-09-24 13:52:04 UTC                       ### high time difference with other nodes
                         RTC time: Fri 2021-09-24 13:56:39
                        Time zone: UTC (UTC, +0000)
        System clock synchronized: no
                      NTP service: active
                  RTC in local TZ: no
    

    If oc debug node is not working, try with SSH via the IPs of the control plane nodes:

    $ oc get nodes -l node-role.kubernetes.io/control-plane= -o wide
    [...]
    $ export MASTER_IP='<IP-1> <IP-2> <IP-3> '
    $ for IP in $MASTER_IP; do ssh core@$IP -o "StrictHostKeyChecking=no" -C  bash -c "hostname; echo; timedatectl"; echo; done
    
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.