What are CPU "C-states" and how to disable them if needed?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 9
  • Red Hat Enterprise Linux 8
  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 5

Issue

  • What are C-states, cstates, or C-modes?
  • How can I disable processor sleep states?
  • How to prevent the kernel to override the BIOS C-state option?
  • Is intel_idle.max_cstate=0 required to disable CPU C-states?

Resolution

To limit a CPU to a certain C-state, you can pass the processor.max_cstate=X option in the kernel line of /boot/grub/grub.conf in RHEL 6 and below, or in /etc/sysconfig/grub , on the GRUB_CMDLINE_LINUX line in RHEL 7/RHEL 8/RHEL 9. Remember that you'll need to propagate the changes to disk using grub2-mkconfig in RHEL 7/RHEL 8/RHEL 9.

Here we limit the system to only C-State 1:

    kernel /vmlinuz-2.6.18-371.1.2.el5 ... processor.max_cstate=1

On some systems, the kernel can override the BIOS setting, and the parameter intel_idle.max_cstate=0 may be required to ensure sleep states are not entered:

	kernel /vmlinuz-2.6.32-431.el6.x86_64 ... processor.max_cstate=1 intel_idle.max_cstate=0

The maximum allowed CPU C-State can be checked by verifying the presence of max_cstate boot parameters within the /proc/cmdline file:

    # grep max_cstate /proc/cmdline

Why the OS might ignore BIOS settings

The OS might ignore BIOS settings based on the idle driver which is in use. If one uses intel_idle (the default on intel machines) the OS can ignore ACPI and BIOS settings, i.e. the driver can re-enable the C-states. In case one disables intel_idle and uses the older acpi_idle driver the OS should follow the BIOS settings. One can disable the intel_idle driver by:

  • passing intel_idle.max_cstate=0 to kernel boot command line or
  • passing idle=* (where * can be e.g. poll, i.e. idle=poll)

In such case the intel_idle will not load during boot and the machine should use acpi_idle driver. One can check what driver is in use by:

# cat /sys/devices/system/cpu/cpuidle/current_driver
intel_idle

Moreover if acpi_idle is used, it sets C1 even if C0 is forced. See below.

NOTE: Passing idle=poll or idle=halt will also disable acpi_idle. The idle=poll use poll idle loop that just executes nop instruction & lock CPUs to C0. The idle=halt leverages hlt instruction to limit CPUs state to C1.

CAUTION: Do not lock cpu to C0 state or POLL state which is equivalent to idle=poll. Locking all CPUs to C0 has negative consequences, such as higher power consumption and increased generation of heat, among others. Locking all CPUs to C0 might void CPU warranty. Please check with the respective hardware vendor.

About the meaning of processor.max_cstate and intel_idle.max_cstate

In order to understand those 2 parameters better, one can refer to the www.kernel.org document and the kernel source code.

processor.max_cstate=1 intel_idle.max_cstate=0
kernel-debuginfo-common-x86_64

https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

(...)
        intel_idle.max_cstate=  [KNL,HW,ACPI,X86]
                        0       disables intel_idle and fall back on acpi_idle.
                        1 to 9  specify maximum depth of C-state.
(...)
        processor.max_cstate=   [HW,ACPI]
                        Limit processor to maximum C-state
                        max_cstate=9 overrides any DMI blacklist limit.

Note that intel_idle.max_state = 0 disables intel_idle and lets acpi_idle (processor.max_state) take over.

vim /usr/src/debug/kernel-3.10.0-693.19.1.el7/linux-3.10.0-693.19.1.el7.x86_64/drivers/idle/intel_idle.c
(... set number ...)
 889 /*
 890  * intel_idle_probe()
 891  */
 892 static int __init intel_idle_probe(void)
 893 {
 894         unsigned int eax, ebx, ecx;
 895         const struct x86_cpu_id *id;
 896 
 897         if (max_cstate == 0) {
 898                 pr_debug(PREFIX "disabled\n");
 899                 return -EPERM;
 900         }
(...)

Next, looking at the acpi processor_idle code:

vim /usr/src/debug/kernel-3.10.0-693.19.1.el7/linux-3.10.0-693.19.1.el7.x86_64/drivers/acpi/processor_idle.c
(... set number ...)
 915         if (max_cstate == 0)
 916                 max_cstate = 1;
(...)

This means that processor.max_cstate=0 intel_idle.max_cstate=0 and processor.max_cstate=1 intel_idle.max_cstate=0 are exactly equivalent.

Note: Some of the c-states like C1E can only be enabled if intel_idle driver is in use, even if it is disabled from BIOS.

Using tuned to set CPU C-States

Using tuned profiles gives improved flexibility and removes the need for a restart. In Red Hat Enterprise Linux, it is possible to set C-States by simply changing a tuned profile. For more details about this option, see:

Root Cause

In order to save energy when the CPU is idle, the CPU can be commanded to enter a low-power mode. Each CPU has several power modes and they are collectively called “C-states” or “C-modes.”.

The lower-power mode was first introduced with the 486DX4 processor. To the present, more power modes has been introduced and enhancements has been made to each mode for the CPU to consume less power in these low-power modes. The idea of these modes is to cut the clock signal and power from idle units inside the CPU. As many units you stop (by cutting the clock) as you reduce the voltage or even completely shut down to save energy. On the other hand, you have to take into account that more time is required for the CPU to “wake up” and be again 100% operational. These modes are known as C-states. They are usually starting in C0, which is the normal CPU operating mode, i.e., the CPU is 100% turned on. With increasing C number, the CPU sleep mode is deeper, i.e., more circuits and signals are turned off and more time the CPU will require to return to C0 mode, i.e., to wake-up. Each mode is also known by a name and several of them have sub-modes with different power saving – and thus wake-up time – levels.

modeNameWhat id doesCPUs
C0Operating StateCPU fully turned onAll CPUs
C1HaltStops CPU main internal clocks via software; bus interface unit and APIC are kept running at full speed486DX4 and above
C1EEnhanced HaltStops CPU main internal clocks via software and reduces CPU voltage; bus interface unit and APIC are kept running at full speedAll socket 775 CPUs
C1E--Stops all CPU internal clocksTurion 64, 65-nm Athlon X2 and Phenom CPUs
C2Stop GrantStops CPU main internal clocks via hardware; bus interface unit and APIC are kept running at full speed486DX4 and above
C2Stop ClockStops CPU internal and external clocks via hardwareOnly 486DX4, Pentium, Pentium MMX, K5, K6, K6-2, K6-III
C2EExtended Stop GrantStops CPU main internal clocks via hardware and reduces CPU voltage; bus interface unit and APIC are kept running at full speedCore 2 Duo and above (Intel only)
C3SleepStops all CPU internal clocksPentium II, Athlon and above, but not on Core 2 Duo E4000 and E6000
C3Deep SleepStops all CPU internal and external clocksPentium II and above, but not on Core 2 Duo E4000 and E6000; Turion 64
C3AltVIDStops all CPU internal clocks and reduces CPU voltageAMD Turion 64
C4Deeper SleepReduces CPU voltagePentium M and above, but not on Core 2 Duo E4000 and E6000 series; AMD Turion 64
C4E/C5Enhanced Deeper SleepReduces CPU voltage even more and turns off the memory cacheCore Solo, Core Duo and 45-nm mobile Core 2 Duo only
C6Deep Power DownReduces the CPU internal voltage to any value, including 0 V45-nm mobile Core 2 Duo only
C7Deep Energy SavingThe CPU tries to flush its L3 cache. If the L3 cache is able to be entirely cleared, the CPU cuts its power to save energy. The power from the system agent is removed too.
C7sWhen an MWAIT(C7) command is issued with a C7s sub-state hint, the entire L3 cache is flushed in one step as opposed to flushing the L3 cache in multiple steps. This also allows the system to send I/O devices to low power mode to reduce unnecessary power consumption when the system idles down.
C8The L3 cache is flushed in a single step. The power to the PLL is cut.
C9The VCCIN (VCC Input Voltage) gets lowered to a minimum.
C10The single phase core management system, VR12.6, goes into a low-power state. The CPU is almost shut down.

Diagnostic Steps

In case is not possible to have a maintenance window to try disabling c-states, we advise to inquiry your hardware vendor about the power savings configuration on the BIOS.The following should be checked:

  • Does the current BIOS have "power savings" feature?
  • Is there any known bug related to "power savings" feature, that affects your BIOS version?
  • Is "power savings" feature enabled? If yes, which mode is set?
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.