Memory Ballooning and OpenShift

Updated

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

Summary

Memory ballooning is not recommended and highly discouraged for OpenShift clusters, and specially for the control plane.

As a best practice for OpenShift clusters, OpenShift control-plane nodes should have committed memory equal to or greater than the published minimum resource requirements for a cluster installation. If memory ballooning is enabled, cluster-wide undefined behaviors, instabilities, and service degradation can occur. Worker nodes should have a minimum reservation equal to or greater than the published minimum resource requirements for a cluster installation.

Note: The minimum CPU and Memory requirements do not account for resources required by user workloads.

Background

Memory ballooning is a technique used in virtualization to optimize memory usage. It essentially allows the physical host machine to reclaim unused memory from virtual machines (VMs) and make it available for other VMs that need it more.

Technically, enabling memory ballooning requires a balloon driver to be installed inside the VM's guest OS. The host communicates with the balloon driver when it detects low physical memory. The balloon driver allocates unused memory within the VM into a special pool. The driver informs the host's hypervisor about this unused memory. The hypervisor then unmaps the physical memory associated with that unused memory, making it available for other VMs.

The end result is that the VM temporarily "returns" unused memory to the host. This dynamic process means the balloon driver can shrink or grow the memory pool based on the VM's current needs.

External references:

Memory Ballooning Issues for OpenShift/Kubernetes

OpenShift already share the memory assigned to each node between different pods, and pods could require to be scheduled in different nodes for different reasons. Having the nodes dealing with the memory ballooning could cause slowness in that process.

When memory ballooning is in use, and the host memory is over-provisioned, memory ballooning reaches a point where it cannot reclaim any memory, affecting processes running within the VM. At that point, the OS on the VMs reacts in various ways:

  • The VM's operating system resorts to swapping. This means it moves inactive memory pages from RAM to disk storage to free up space for active processes.

  • Processes might experience delays and slowdowns and become less responsive. Due to increased swapping activity, they may take longer to complete tasks or require more time to load data from disk.

    • There is no swap in OpenShift/Kubernetes, which means it will go to the OOM kill step.
  • The operating system employs the Out-Of-Memory (OOM) killer as a last resort. The OOM killer prioritizes processes based on various factors, such as memory usage and criticality. It then terminates the process with the lowest priority to free up memory. If a critical process gets terminated, this can lead to unexpected application crashes or data loss.

    • Depending on the application, memory pressure might lead to instability or crashes within the application itself.

      • This is the case for some critical core components on the OpenShift/Kubernetes control plane.
    • Some applications might attempt to gracefully shut down or save their state before the OOM killer terminates them.

To check if memory ballooning is enabled, and the memory from the nodes it consumes, refer to how to find out what amount of memory a VMWare balloon driver has consumed from a virtualized server.

Category
Components
Article Type