Out of memory (OOM) killer in memory cgroup
Environment
- Red Hat Enterprise Linux
- System kills process if memory cgroup on system gets out of memory and reaches cgroup memory limit.
Issue
- System kills process if memory cgroup on system gets out of memory and reaches cgroup memory limit
Nov 1 16:11:42 lab kernel: s1-agent invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
Nov 1 16:11:42 lab kernel: s1-agent cpuset=/ mems_allowed=0-1
Nov 1 16:11:42 lab kernel: CPU: 45 PID: 13331 Comm: s1-agent Tainted: G W ------------ 3.10.0-957.21.3.el7.x86_64 #1
Nov 1 16:11:42 lab kernel: Hardware name: HP ProLiant XL450 Gen9 Server/ProLiant XL450 Gen9 Server, BIOS U21 01/22/2018
Nov 1 16:11:42 lab kernel: Call Trace:
Nov 1 16:11:42 lab kernel: [<ffffffffa4763107>] dump_stack+0x19/0x1b
Nov 1 16:11:42 lab kernel: [<ffffffffa475db2a>] dump_header+0x90/0x229
Nov 1 16:11:42 lab kernel: [<ffffffffa41ba386>] ? find_lock_task_mm+0x56/0xc0
Nov 1 16:11:42 lab kernel: [<ffffffffa42317a8>] ? try_get_mem_cgroup_from_mm+0x28/0x60
Nov 1 16:11:42 lab kernel: [<ffffffffa41ba834>] oom_kill_process+0x254/0x3d0
Nov 1 16:11:42 lab kernel: [<ffffffffa4235586>] mem_cgroup_oom_synchronize+0x546/0x570
Nov 1 16:11:42 lab kernel: [<ffffffffa4234a00>] ? mem_cgroup_charge_common+0xc0/0xc0
Nov 1 16:11:42 lab kernel: [<ffffffffa41bb0c4>] pagefault_out_of_memory+0x14/0x90
Nov 1 16:11:42 lab kernel: [<ffffffffa475c032>] mm_fault_error+0x6a/0x157
Nov 1 16:11:42 lab kernel: [<ffffffffa47707c8>] __do_page_fault+0x3c8/0x4f0
Nov 1 16:11:42 lab kernel: [<ffffffffa4770925>] do_page_fault+0x35/0x90
Nov 1 16:11:42 lab kernel: [<ffffffffa476c768>] page_fault+0x28/0x30
Nov 1 16:11:42 lab kernel: Task in /agent killed as a result of limit of /agent
Nov 1 16:11:42 lab kernel: memory: usage 1048576kB, limit 1048576kB, failcnt 1559756
Nov 1 16:11:42 lab kernel: memory+swap: usage 1048752kB, limit 9007199254740988kB, failcnt 0
Nov 1 16:11:42 lab kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Nov 1 16:11:42 lab kernel: Memory cgroup stats for /agent: cache:136KB rss:1048440KB rss_huge:0KB mapped_file:0KB swap:176KB inactive_anon:528216KB active_anon:520188KB inactive_file:44KB active_file:24KB unevictable:0KB
Nov 1 16:11:42 lab kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Nov 1 16:11:42 lab kernel: [13293] 992 13293 3411343 264627 985 61 0 s1-agent
Nov 1 16:11:42 lab kernel: Memory cgroup out of memory: Kill process 22811 (s1-agent) score 112 or sacrifice child
Nov 1 16:11:42 lab kernel: Killed process 13293 (s1-agent) total-vm:13645372kB, anon-rss:1052332kB, file-rss:6176kB, shmem-rss:0kB
Resolution
-
Please check with the vendor of the process killed if in case its a 3rd Party service and check the memory utilization requirement and increase the "MemoryLimit=" value of the process as per their recommendations.
-
Each cgroup maintains a per cgroup LRU which has the same structure as global VM. When a cgroup goes over its limit, we first try to reclaim memory from the cgroup so as to make space for the new pages that the cgroup has touched. If the reclaim is unsuccessful, an OOM routine is invoked to select and kill the bulkiest task in the
cgroup.Additionally [However not recommended unless you require advance troubleshooting/analysis] you can set "vm.panic_on_oom" value to "2" which will capture vmcore. If we set "vm.panic_on_oom" to 2, the kernel panics compulsorily even if OOM happens under memory cgroup, the whole system panics.
Possible workaround:
-
memory.oom_control file is for OOM notification and other controls.
-
Memory cgroup implements OOM notifier using the cgroup notification API (See cgroups.txt). It allows to register multiple OOM notification delivery and gets notification when OOM happens.
-
To register a notifier, an application must:
- create an eventfd using eventfd(2)
- open memory.oom_control file
- write string like "<event_fd>
" to
cgroup.event_control
-
The application will be notified through eventfd when OOM happens.
OOM notification doesn't work for the root cgroup. -
You can disable the OOM-killer by writing "1" to memory.oom_control file, as:
#echo 1 > memory.oom_control -
If OOM-killer is disabled, tasks under cgroup will hang/sleep in memory cgroup's OOM-waitqueue when they request accountable memory.
-
For running them, you have to relax the memory cgroup's OOM status by
* enlarge limit or reduce usage.
To reduce usage,
* kill some tasks.
* move some tasks to other group with account migration.
* remove some files (on tmpfs?)
Then, stopped tasks will work again.
- At reading, current status of OOM is shown.
oom_kill_disable 0 or 1 (if 1, oom-killer is disabled)
under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may
be stopped.)
Troubleshooting
-
Sometimes a user might find that the application under a cgroup is terminated by the OOM killer.
-
There are several causes for this:
- The cgroup limit is too low (just too low to do anything useful)
- The user is using anonymous memory and swap is turned off or too low
- A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of
some of the pages cached in the cgroup (page cache pages).
Test Case
-
In the below test case we tested memory limit for the httpd.service unit which is an RHEL component.
-
Checking httpd service status
[root@localhost ~]# systemctl status httpd.service
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system.control/httpd.service.d
└─50-MemoryLimit.conf
Active: active (running) since Fri 2022-05-13 09:33:47 EDT; 6min ago
Docs: man:httpd.service(8)
Process: 34175 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS)
Main PID: 33676 (httpd)
Status: "Running, listening on: port 80"
Tasks: 213 (limit: 11260)
Memory: 9.7M (limit: 10.0M) <------------------------ 10 MB Limit
CGroup: /system.slice/httpd.service
├─33676 /usr/sbin/httpd -DFOREGROUND
├─34176 /usr/sbin/httpd -DFOREGROUND
├─34177 /usr/sbin/httpd -DFOREGROUND
├─34178 /usr/sbin/httpd -DFOREGROUND
└─34179 /usr/sbin/httpd -DFOREGROUND
May 13 09:33:47 localhost.localdomain systemd[1]: Started The Apache HTTP Server.
May 13 09:33:47 localhost.localdomain httpd[33676]: Server configured, listening on: port 80
May 13 09:35:17 localhost.localdomain systemd[1]: Reloading The Apache HTTP Server.
May 13 09:35:17 localhost.localdomain httpd[33948]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using localhost.localdomain. Set the 'ServerName' directive globally to>
May 13 09:35:17 localhost.localdomain systemd[1]: Reloaded The Apache HTTP Server.
May 13 09:35:17 localhost.localdomain httpd[33676]: Server configured, listening on: port 80
May 13 09:35:44 localhost.localdomain systemd[1]: Reloading The Apache HTTP Server.
May 13 09:35:46 localhost.localdomain httpd[34175]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using localhost.localdomain. Set the 'ServerName' directive globally to>
May 13 09:35:46 localhost.localdomain systemd[1]: Reloaded The Apache HTTP Server.
May 13 09:35:46 localhost.localdomain httpd[33676]: Server configured, listening on: port 80
- Further checking the memory limit (MemoryLimit.conf) from the httpd service file.
[root@localhost ~]# cat /etc/systemd/system.control/httpd.service.d/50-MemoryLimit.conf
# This is a drop-in unit file extension, created via "systemctl set-property"
# or an equivalent operation. Do not edit.
[Service]
MemoryLimit=10485760 <----------- 10240 KB <------ 10 MB
Checking “memory.limit_in_bytes”.
[root@localhost ~]# cat /sys/fs/cgroup/memory/system.slice/httpd.service/memory.limit_in_bytes
10485760 <----------- 10240 KB <------ 10 MB
memory.stat
-
This breaks down the cgroup’s memory footprint into different types of memory, type-specific details, and other information on the state and past events of the memory management system. All memory amounts are in bytes.
-
The entries are ordered to be human readable, and new entries can show up in the middle. Don’t rely on items remaining in a fixed position; use the keys to look up specific values!
-
memory.stat reports a wide range of memory statistics, below is the output of the memory statistics for httpd.service.
[root@localhost ~]# cat /sys/fs/cgroup/memory/system.slice/httpd.service/memory.stat
cache 4866048
rss 15142912
rss_huge 8388608
shmem 270336
mapped_file 3514368
dirty 0
writeback 0
swap 0
pgpgin 3201
pgpgout 321
pgfault 3663
pgmajfault 33
inactive_anon 15499264
active_anon 0
inactive_file 2297856
active_file 2297856
unevictable 0
hierarchical_memory_limit 104857600
hierarchical_memsw_limit 9223372036854771712
total_cache 4866048
total_rss 15142912
total_rss_huge 8388608
total_shmem 270336
total_mapped_file 3514368
total_dirty 0
total_writeback 0
total_swap 0
total_pgpgin 3201
total_pgpgout 321
total_pgfault 3663
total_pgmajfault 33
total_inactive_anon 15499264
total_active_anon 0
total_inactive_file 2297856
total_active_file 2297856
total_unevictable 0
For more information on the Memory cgroup v2 please refer below link
Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability.
Root Cause
- Out of memory (OOM) killer in memory cgroup.
Diagnostic Steps
- sosreport analysis:
Nov 1 16:11:42 lab kernel: s1-agent invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
Nov 1 16:11:42 lab kernel: s1-agent cpuset=/ mems_allowed=0-1
Nov 1 16:11:42 lab kernel: CPU: 45 PID: 13331 Comm: s1-agent Tainted: G W ------------ 3.10.0-957.21.3.el7.x86_64 #1
Nov 1 16:11:42 lab kernel: Hardware name: HP ProLiant XL450 Gen9 Server/ProLiant XL450 Gen9 Server, BIOS U21 01/22/2018
Nov 1 16:11:42 lab kernel: Call Trace:
Nov 1 16:11:42 lab kernel: [<ffffffffa4763107>] dump_stack+0x19/0x1b
Nov 1 16:11:42 lab kernel: [<ffffffffa475db2a>] dump_header+0x90/0x229
Nov 1 16:11:42 lab kernel: [<ffffffffa41ba386>] ? find_lock_task_mm+0x56/0xc0
Nov 1 16:11:42 lab kernel: [<ffffffffa42317a8>] ? try_get_mem_cgroup_from_mm+0x28/0x60
Nov 1 16:11:42 lab kernel: [<ffffffffa41ba834>] oom_kill_process+0x254/0x3d0
Nov 1 16:11:42 lab kernel: [<ffffffffa4235586>] mem_cgroup_oom_synchronize+0x546/0x570
Nov 1 16:11:42 lab kernel: [<ffffffffa4234a00>] ? mem_cgroup_charge_common+0xc0/0xc0
Nov 1 16:11:42 lab kernel: [<ffffffffa41bb0c4>] pagefault_out_of_memory+0x14/0x90
Nov 1 16:11:42 lab kernel: [<ffffffffa475c032>] mm_fault_error+0x6a/0x157
Nov 1 16:11:42 lab kernel: [<ffffffffa47707c8>] __do_page_fault+0x3c8/0x4f0
Nov 1 16:11:42 lab kernel: [<ffffffffa4770925>] do_page_fault+0x35/0x90
Nov 1 16:11:42 lab kernel: [<ffffffffa476c768>] page_fault+0x28/0x30
Nov 1 16:11:42 lab kernel: Task in /agent killed as a result of limit of /agent
Nov 1 16:11:42 lab kernel: memory: usage 1048576kB, limit 1048576kB, failcnt 1559756
Nov 1 16:11:42 lab kernel: memory+swap: usage 1048752kB, limit 9007199254740988kB, failcnt 0
Nov 1 16:11:42 lab kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Nov 1 16:11:42 lab kernel: Memory cgroup stats for /agent: cache:136KB rss:1048440KB rss_huge:0KB mapped_file:0KB swap:176KB inactive_anon:528216KB active_anon:520188KB inactive_file:44KB active_file:24KB unevictable:0KB
Nov 1 16:11:42 lab kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Nov 1 16:11:42 lab kernel: [13293] 992 13293 3411343 264627 985 61 0 s1-agent
Nov 1 16:11:42 lab kernel: Memory cgroup out of memory: Kill process 22811 (s1-agent) score 112 or sacrifice child
Nov 1 16:11:42 lab kernel: Killed process 13293 (s1-agent) total-vm:13645372kB, anon-rss:1052332kB, file-rss:6176kB, shmem-rss:0kB
- Lets check the memory limit set for the above process. We can see "memory.limit_in_bytes" is set as 1048576 kb and from the logs messages we can see the process had already used 1048576 kb.
Usage 1048576kB <- memory.usage_in_bytes # show current usage for memory
limit 1048576kB <- memory.limit_in_bytes # limit of memory usage set <<---------------
|
|
* We can see memory limit has been set for agent:- |
|
|
$ cat sys/fs/cgroup/memory/agent/memory.limit_in_bytes |
1073741824 <<---- 1048576 in kb <<-------------------------------------------------
The memory.limit_in_bytes parameter represents the amount of memory that is made available to all processes within a certain cgroup.
- From the logs messages we can see "failcnt 1559756". The memory.failcnt reports the number of times that the memory limit has reached the value set in memory.limit_in_bytes.
failcnt 1559756 <- memory.failcnt # show the number of memory usage hits limits <--------
|
|
$ cat sys/fs/cgroup/memory/agent/memory.failcnt |
1559756 <----------------------------------------------------------------------------------
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.