Understanding workload hints
August 16 2024
The following table describes how combinations of power consumption and real-time settings impact on latency.
If the workload hint is configured manually and the realTime workload hint is not explicitly set then it defaults to true.
⚠️ Information: This baseline configuration is intended as a starting point and might change from release to release. It can also be changed by additional kernel arguments being set in an applied PerformanceProfile or Tuned CR. For more information about the PerformanceProfile, see Creating a performance profile. For information about the 5G reference design PerformanceProfile, see This page is not included, but the link has been rewritten to point to the nearest parent document.Telco 5G RAN performance profile.
| Performance Profile creator setting | Hint | Environment | Description |
| Default | workloadHints:
highPowerConsumption: false
realTime: false |
High throughput cluster without latency requirements | Performance achieved through CPU partitioning only. |
| low-latency | workloadHints:
highPowerConsumption: false
realTime: true |
Regional datacenters | Both energy savings and low-latency are desirable: compromise between power management, latency, and throughput. |
| ultra-low-latency | workloadHints:
highPowerConsumption: true
realTime: true |
Ultra low latency-critical workloads | Optimized for absolute minimal latency and maximum determinism at the cost of increased power consumption. |
| per-pod power management | workloadHints:
realTime: true
highPowerConsumption: false
perPodPowerManagement: true |
Critical and non-critical workloads | Allows for power management per pod. |
This table describes the list of kernel arguments configured by applying the default settings of workload hints. The kernel arguments apply to all profiles with the exceptions called out in notes in the description field.
| Argument | Description |
skew_tick=1 1
|
Configures the kernel to adjust the timer skew. This parameter adjusts the timing of timer interrupts to synchronize with the CPU frequency and reduce timing errors. |
tsc=reliable 2
|
Specifies that the timestamp counter (TSC) should be used as a reliable time source. The TSC is a high-resolution per-CPU counter used for performance monitoring and timing purposes. |
rcupdate.rcu_normal_after_boot=1 3
|
This setting ensures that read-copy update (RCU) normal state is used until after the system boot process completes, optimizing boot performance and avoiding interference with other boot-time tasks.
|
nohz=on 4
|
This setting enables the NOHZ feature in the Linux kernel, which eliminates timer interrupts on idle CPUs to improve power efficiency and overall system performance.
|
rcu_nocbs=<list of isolated cpus> 5
|
Specifies that the isolated cpus are excluded from acting as "no-CBs" CPUs. This means that these CPUs will not be responsible for executing RCU callbacks. Instead, other CPUs in the system will handle RCU-related tasks. By specifying the CPUs that handle RCU callbacks, you can better use CPU resources and improve overall system performance. |
tuned.non_isolcpus=<list of reserved cpus> 6
|
Configures the tuned daemon with the list of non-isolated CPUs. This informs tuned that those CPUs are available for performing system tasks. |
systemd.cpu_affinity<list of reserved cpus> 7
|
Sets CPU affinity for systemd processes, restricting them to run on reserved cpus. Ensures that system processes don’t interfere with low-latency workloads.
|
intel_iommu=on 8
|
Activates Intel virtualization technology direct (VT-d) I/O access support in the Linux kernel, enabling enhanced virtualization capabilities and improved security and performance for I/O device management in virtual environments. Intel specific setting. |
iommu=pt 9
|
Configures the input–output memory management unit (IOMMU) to operate in passthrough mode, enabling direct assignment of physical I/O devices to virtual machines in virtualized environments. This can improve performance and flexibility for certain workloads but requires careful consideration and configuration. |
isolcpus=managed_irq,<list of isolated cpus> 10
|
Isolates CPU from being targeted by interrupts to ensure these don’t interfere with low latency workloads. |
nohz_full=<list of isolated cpus> 11
|
Configures the kernel to operate in a tickless mode on the isolated cpus. Tickless mode eliminates timer interrupts on idle CPUs or those which only have a single runnable task, improving power efficiency and reducing latency. |
nosoftlockup 12
|
Disables the soft lockup detection mechanism in the kernel. Soft lockup detection monitors for prolonged periods of CPU inactivity, which might indicate a system hang. |
nmi_watchdog=0 13
|
Disables the non-maskable Interrupt (NMI) watchdog timer. The NMI watchdog is a hardware timer used to detect system hangs or lockups. |
mce=off 14
|
Disables Machine Check Exception (MCE) reporting. MCE is a hardware feature that detects and reports hardware errors. |
processor.max_cstate=1 15
|
Specifies the maximum processor idle state (C-state) allowed. Limiting the maximum C-state can improve system responsiveness and reduce power consumption.
Only applies to ultra low latency profile. |
intel_idle.max_cstate=0 16
|
Sets the maximum C-state allowed for Intel CPUs to C0, effectively disabling deeper idle states. Intel specific setting.
Only applies to ultra low latency profile. |
intel_pstate=disable 17
|
Disables Intel P-state driver. Intel P-state is a feature that dynamically adjusts CPU frequency and voltage for power management. Intel specific setting.
Applies when the ultra low and low latency profile is configured. The per pod management profile sets this to Starting from OpenShift Container Platform 4.16, |
rcutree.kthread_prio=11 18
|
By setting the priority of read-copy update (RCU) kernel threads to 11, the kernel prioritizes RCU related tasks, potentially ensuring timely execution and efficient handling of RCU processing, which can contribute to overall system performance and responsiveness.
|
idle=poll 19
|
By specifying idle=poll, the kernel instructs the idle process to continuously poll for tasks to run rather than entering traditional idle states. Instead of transitioning to a low-power idle state and waiting for an interrupt to awaken the CPU, the kernel keeps the CPU actively checking for pending tasks. The idle=poll parameter can be useful in specific scenarios where latency is critical, and the overhead associated with transitioning between idle states is undesirable.
Note: The |
Note: The workload hints associated with the Telco 5G RAN performance profile is:
realTime: true
highPowerConsumption: false
perPodPowerManagement: false
In addition, with the realTime hint set to true the following arguments are added to tuned configuration:
| Argument | Description |
service.stalld=start,enable
|
This setting configures the stalld service to start and enable during system boot. The stalld service is part of the tuned framework and is responsible for monitoring system stalls. Enabling and starting this service ensures that the system is actively monitored for stalls, which can indicate performance issues or resource contention.
|
sched_rt_runtime_us=-1
|
This parameter configures the maximum runtime period for real-time tasks scheduled by the Linux kernel. A value of -1 or infinity means that there is no enforced time limit, allowing real-time tasks to run without restriction. Real-time tasks are those that require deterministic and low-latency execution, such as audio/video processing or industrial control systems. |
kernel.hung_task_timeout_secs=600
|
This setting configures the timeout period in seconds for detecting hung tasks in the Linux kernel. When a task becomes unresponsive or hung, the kernel might mark it as such and trigger a system notification or action. Setting this parameter to 600 seconds means that the kernel will consider a task as hung if it remains unresponsive for more than 10 minutes. |
vm.stat_interval=10
|
Configures the interval in seconds for collecting statistics about memory usage and performance in the virtual memory subsystem. Setting this parameter to 10 seconds means that memory statistics are collected every 10 seconds, providing insights into memory use and performance over time. |
References
1: Reducing CPU performance spikes
2: Optimizing RHEL 9 for Real Time for low latency operation
3: Content from docs.kernel.org is not included.Using RCU’s CPU Stall Detector
4: Isolating CPUs using the nohz and nohz_full parameters
6: Performance addons operator advanced configuration
7: Content from www.man7.org is not included.systemd-system.conf
8: Configuring a Host for PCI Passthrough
9: Content from www.kernel.org is not included.The kernel’s command-line parameters
10: Content from wiki.linuxfoundation.org is not included.isolcpus
11: Isolating CPUs using the nohz and nohz_full parameters
14: How do I disable MCE function?
15: Content from www.kernel.org is not included.Idle States Control Via Kernel Command Line and Controlling power management transitions
16: Content from www.kernel.org is not included.Idle States Control Via Kernel Command Line
17: Content from www.kernel.org is not included.intel_pstate CPU Performance Scaling Driver and Tuning CPU frequency to optimize energy consumption
18: Content from lwn.net is not included.Kernel configuration parameters for RCU
19: Content from www.kernel.org is not included.Idle States Control Via Kernel Command Line