CFS quotas can lead to unnecessary throttling in OpenShift Container Platform

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (OCP)
    • 3
    • 4
  • Red Hat Enterprise Linux (RHEL)
    • 7
    • 8

Issue

  • On OpenShift with hard cpu limits we noticed java applications were not be allowed to use their entire quota. This particularly affected massively threaded applications. For example if you assign 1 cpu to an application, it only ever maxed out at using .1 cpu while also hitting throttling the entire time it was running. This results in having to massively over-provision applications thereby wasting resources on the nodes.
  • CFS quotas can lead to unnecessary throttling
  • Is Red Hat affected by <Content from github.com is not included.https://github.com/kubernetes/kubernetes/issues/67577> ?

Resolution

This issue was addressed in Red Hat Enterprise Linux 7 and 8 and thus is also considered resolved in Red Hat OpenShift Container Platform 3 and 4. The below listed Errata need to be installed (or a newer version) to prevent the unnecessary throttling from happening:

  • Red Hat Enterprise Linux 7.8 RHSA-2020:1016 or later.

    • Kernel version: kernel-3.10.0-1127.el7
  • Red Hat Enterprise Linux 7.7z RHSA-2019:4106

    • Kernel version: kernel-3.10.0-1062.9.1.el7
  • Red Hat Enterprise Linux 8 RHBA-2019:4282 or later.

    • Kernel version: kernel-4.18.0-147.3.1.el8_1

Root Cause

It has been observed, that highly-threaded, non-cpu-bound applications running under cpu.cfs_quota_us constraints can hit a high percentage of periods throttled while simultaneously not consuming the allocated amount of quota. This use case is typical of user-interactive non-cpu bound applications, such as those running in kubernetes or mesos when run on multiple cpu cores.

This has been root caused to cpu-local run queue being allocated per cpu bandwidth slices, and then not fully using that slice within the period. At which point the slice and quota expires. This expiration of unused slice results in applications not being able to utilize the quota for which they are allocated.

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.