High CPU Utilization under %steal makes control plane unstable
Issue
Control Plane nodes become unstable when there is high cpu utilization under %steal.
Resolution
- Engage your hypervisor team to cross verify the hypervisor resources [physical] allocated and cpu/cores allocated to VM's.
- Ask them to share the detailed analysis and opinion on this.
Root Cause
-
Hypervisor is overcommiting cpu resources which is leading to high cpu utilization under %st (steal time) on all master vm's.
-
Also resulting in high load average reported on them.
-
Due to this unusual resource tilization on VM's oc commands are failing and login to cluster is also failing through oc command and console.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.