How to delete all kubernetes.io/events in etcd

Solution Verified - Updated 27 Oct 2025

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 4
etcd
- 3
events

Issue

There is a large number of events in etcd.
The etcd is firing NOSPACE alarm.
OpenShift API is unavailable or oc delete command is too slow to delete all events.

Resolution

Kubernetes events have a time-to-live of 3 hours as explained in time-to-live for the events in an RHOCP cluster, so having a big amount of events is usually the consequence of other issues, and not the cause. After deleting the events as a workaround, it is needed to check the cause of big amount of events created.

IMPORTANT NOTE: Do not delete other resources but events following this procedure. Only events should be deleted this way. Deleting any other resource from etcd following this procedure could cause the cluster to fail. For other resources, please investigate the reason for the big amount of instances and only delete them with oc delete [resource_name] -n [namespace_name].

Deleting `events` from etcd

Before events deletion in the etcd, check that an This page is not included, but the link has been rewritten to point to the nearest parent document.etcd backup is available.

Connect to the etcd pod from CoreOS:

$ ssh -i <identity> core@master
core@master$ sudo -i
root@master# crictl exec -ti $(crictl ps --label "io.kubernetes.container.name=etcdctl" -q) /bin/sh

Check number of events and etcd status (confirm that etcd is filled with events):

$ etcdctl --command-timeout=60s get --prefix --keys-only / | awk -F/ '/./ { print $3 }' | sort | uniq -c | sort -n
<truncated>
    410 serviceaccounts
    788 configmaps
    807 oauth
   2734 secrets
3977840 events

Check the etcd endpoints status, they must be reachable. A NOSPACE alarm may be fired:

$ etcdctl endpoint status -w table
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+---------------+
|         ENDPOINT      |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS        |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+---------------+
| https://10.0.0.1:2379 | 123456789abcdef1 |  3.4.9  |  8.2 GB |     false |      false |      4470 |    2273805 |            2273805 | alarm:NOSPACE |
| https://10.0.0.2:2379 | 123456789abcdef2 |  3.4.9  |  8.2 GB |      true |      false |      4470 |    2273805 |            2273805 | alarm:NOSPACE |
| https://10.0.0.2:2379 | 123456789abcdef3 |  3.4.9  |  8.2 GB |     false |      false |      4470 |    2273805 |            2273805 | alarm:NOSPACE |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+---------------+

$ etcdctl alarm list
memberID:123456789abcdef1 alarm:NOSPACE
memberID:123456789abcdef2 alarm:NOSPACE
memberID:123456789abcdef3 alarm:NOSPACE

Check in which namespaces are the biggest amount of events generated; this can be used to address the root cause or misbehaving applications:
```
$ etcdctl --command-timeout=60s get --prefix --keys-only / |awk -F/ '/./ { print $3 " " $4}' | grep events | sort | uniq -c | sort -n
```

Delete all events from etcd. Run the following loop to start the deletion process:

    # Variable definition
    # COUNT: number of events deleted per requests. 
    # FROM: first event in etcd
    # TO: last event to delete in the query
    # NUM: number of events deleted

    COUNT=10000
    FROM="$(etcdctl --command-timeout=60s get '/kubernetes.io/events/' --prefix --keys-only --limit 1)"
    while :; do
      TO="$(etcdctl get '/kubernetes.io/events/' --command-timeout=60s --prefix --keys-only --limit ${COUNT} | sed '/^$/d' | tail -1)"
      [ $(etcdctl get ${FROM} ${TO} --command-timeout=60s --keys-only | grep -vEc "^$|^/kubernetes.io/events/") -eq 0 ] && NUM=$(etcdctl --command-timeout=60s del ${FROM} ${TO}) || { echo "Non event key found, aborting..." ; break ;}
      [ "${NUM}" == "0" ] && echo "All events deleted" && break
      echo "${NUM} events deleted"
    done

    <truncated>
    9999 events deleted
    9999 events deleted
    411 events deleted
    All events deleted

Free disk space usage. Although all events have been deleted, etcd disk usage has not changed. To free it, etcd must be compacted and defragmented as explained in how to compact and defrag etcd to decrease database size in OpenShift 4.

Root Cause

Large number of events are created due to cluster issues (like operators trying to remediate the cluster state, an issue with an internal or external component creating objects in an infinite loop, ...), could cause etcd poor performance with a lot of leader reelections. It is needed to investigate the real issue, as the big amount of events could be created again in few time.

Diagnostic Steps

Most of API calls are failing with etcdserver: mvcc: database space exceeded or etcdserver: leader changed.
oc describe node takes several minutes to run and oc get events is impossible on many namespaces.

Checking etcd size show that it is full (more than 8GB which is default max value):

  $ etcdctl endpoint status -w table
  +-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+---------------+
  |         ENDPOINT      |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS        |
  +-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+---------------+
  | https://10.0.0.1:2379 | 123456789abcdef1 |  3.4.9  |  8.2 GB |     false |      false |      4470 |    2273805 |            2273805 | alarm:NOSPACE |
  | https://10.0.0.2:2379 | 123456789abcdef2 |  3.4.9  |  8.2 GB |      true |      false |      4470 |    2273805 |            2273805 | alarm:NOSPACE |
  | https://10.0.0.2:2379 | 123456789abcdef3 |  3.4.9  |  8.2 GB |     false |      false |      4470 |    2273805 |            2273805 | alarm:NOSPACE |
  +-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+---------------+

Checking the numbers of objects by type shows us that the etcd is full of events.

  $ etcdctl --command-timeout=60s get --prefix --keys-only / | awk -F/ '/./ { print $3 }' | sort | uniq -c | sort -n
  <truncated>
      410 serviceaccounts
      788 configmaps
      807 oauth
     2734 secrets
  3977840 events

Refer to how to list the number of objects and size in etcd on OpenShift for additional information about number and size of the resources in etcd.

SBR

Shift

Product(s)

Red Hat OpenShift Container Platform

Components

etcd

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.