How to interpret an OOM killer message

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux

Issue

This article will go line by line through a full OOM-killer message and explain what the information means. There are slight differences between the OOM-killer message across major RHEL versions that have been noted in the body of the Resolution section below.

For additional information about the OOM killer please see the following artciles:

Resolution

The first line lists the name of the process that invoked the oom killer, the type of memory allocation (gfp_mask), what nodes memory can be allocated from (nodemask), how large the allocation was (order), and the oom_score_adj.

  • Note that the process that invoked the OOM killer is not necessarily the process responsible for any memory pressure; it is simply the process that requested some memory after the free memory fell below the minimum water mark
Aug 10 08:07:16 localhost kernel: olm invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=996

The next three lines list more information about the memory allocation, the PID that invoked the OOM killer, and the hardware of the system.

Aug 10 08:07:17 localhost kernel: olm cpuset=/ mems_allowed=0
Aug 10 08:07:17 localhost kernel: CPU: 2 PID: 1614944 Comm: olm Not tainted 4.18.0-147.3.1.el8_1.x86_64 #1
Aug 10 08:07:17 localhost kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018

Following that is the kernel stack of the invoking PID when the OOM killer was called followed by a CPU register dump.

Aug 10 08:07:17 localhost kernel: Call Trace:
Aug 10 08:07:17 localhost kernel:  dump_stack+0x5c/0x80
Aug 10 08:07:17 localhost kernel:  dump_header+0x6e/0x27a
Aug 10 08:07:17 localhost kernel:  oom_kill_process.cold.29+0xb/0x10
Aug 10 08:07:17 localhost kernel:  out_of_memory+0x1ba/0x490
Aug 10 08:07:17 localhost kernel:  __alloc_pages_slowpath+0xc0f/0xce0
Aug 10 08:07:17 localhost kernel:  __alloc_pages_nodemask+0x245/0x280
Aug 10 08:07:17 localhost kernel:  filemap_fault+0x3d8/0x860
Aug 10 08:07:17 localhost kernel:  ? page_add_file_rmap+0x109/0x200
Aug 10 08:07:17 localhost kernel:  ? pmd_devmap_trans_unstable+0x2a/0x40
Aug 10 08:07:17 localhost kernel:  ? alloc_set_pte+0x38a/0x480
Aug 10 08:07:17 localhost kernel:  ? _cond_resched+0x15/0x30
Aug 10 08:07:17 localhost kernel:  __xfs_filemap_fault+0x6d/0x200 [xfs]
Aug 10 08:07:17 localhost kernel:  ? alloc_set_pte+0x38a/0x480
Aug 10 08:07:17 localhost kernel:  __do_fault+0x20/0x80
Aug 10 08:07:17 localhost kernel:  do_fault+0x18d/0x3e0
Aug 10 08:07:17 localhost kernel:  ? futex_wake+0x90/0x170
Aug 10 08:07:17 localhost kernel:  __handle_mm_fault+0x539/0x6b0
Aug 10 08:07:17 localhost kernel:  handle_mm_fault+0xda/0x200
Aug 10 08:07:17 localhost kernel:  __do_page_fault+0x22b/0x4e0
Aug 10 08:07:17 localhost kernel:  do_page_fault+0x32/0x110
Aug 10 08:07:17 localhost kernel:  ? page_fault+0x8/0x30
Aug 10 08:07:17 localhost kernel:  page_fault+0x1e/0x30
Aug 10 08:07:17 localhost kernel: RIP: 0033:0x467830
Aug 10 08:07:17 localhost kernel: Code: Bad RIP value.
Aug 10 08:07:17 localhost kernel: RSP: 002b:000000c00005a748 EFLAGS: 00010206
Aug 10 08:07:17 localhost kernel: RAX: 000000c001dab7a0 RBX: 0000000000000000 RCX: 000000c007848748
Aug 10 08:07:17 localhost kernel: RDX: 0000000000467830 RSI: 000000c00004ae18 RDI: 000000c00004afe0
Aug 10 08:07:17 localhost kernel: RBP: 000000c00005a768 R08: 0000000000000080 R09: 000000000000004d
Aug 10 08:07:17 localhost kernel: R10: 00351702d77c3b40 R11: 0000000000000012 R12: 00351702d77ef7b0
Aug 10 08:07:17 localhost kernel: R13: 0000000000000013 R14: 0000000000000003 R15: 000000c0065714a0

The next section contains the Mem-Info which breaks down how the memory on the system is used. In this particular example the slab_unreclaimable field is especially interesting due to its size, suggesting unreclaimable slab may be at the root of the issue in this case.

  • Note the values here are in pages. As we saw above this system is of the x86_64 architecture so one page is 4096 Byte (=4KiB), meaning 4989109 pages is 19.03 GiB. Page size varies by architecture. You can get your page size with getconf PAGE_SIZE.
  • Note that on earlier versions of RHEL the Mem-Info section is all on one line with no newline characters. The information is still the same.
Aug 10 08:07:17 localhost kernel: Mem-Info:
Aug 10 08:07:17 localhost kernel: active_anon:2871645 inactive_anon:1599 isolated_anon:0
                                       active_file:4322 inactive_file:26005 isolated_file:130
                                       unevictable:19133 dirty:0 writeback:0 unstable:0
                                       slab_reclaimable:170394 slab_unreclaimable:4989109
                                       mapped:6211 shmem:2941 pagetables:16597 bounce:0
                                       free:49952 free_pcp:1 free_cma:0

The following lines give a per-node and per-zone breakdown of how the memory was used. Note for the Node 0 Normal zone free was less than min which is why the OOM killer was invoked.

Aug 10 08:07:17 localhost kernel: Node 0 active_anon:11486580kB inactive_anon:6396kB active_file:17288kB inactive_file:104020kB unevictable:76532kB isolated(anon):0kB isolated(file):264kB mapped:24844kB dirty:0kB writeback:0kB shmem:11764kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1361920kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Aug 10 08:07:17 localhost kernel: Node 0 DMA free:15908kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 10 08:07:17 localhost kernel: lowmem_reserve[]: 0 2929 32080 32080 32080
Aug 10 08:07:17 localhost kernel: Node 0 DMA32 free:122544kB min:6168kB low:9168kB high:12168kB active_anon:1603848kB inactive_anon:0kB active_file:472kB inactive_file:3796kB unevictable:16kB writepending:0kB present:3129280kB managed:3063744kB mlocked:16kB kernel_stack:32kB pagetables:156kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 10 08:07:17 localhost kernel: lowmem_reserve[]: 0 0 29150 29150 29150
Aug 10 08:07:17 localhost kernel: Node 0 Normal free:61356kB min:61380kB low:91228kB high:121076kB active_anon:9882732kB inactive_anon:6396kB active_file:17032kB inactive_file:99196kB unevictable:76516kB writepending:0kB present:30408704kB managed:29856572kB mlocked:76516kB kernel_stack:30352kB pagetables:66232kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB
Aug 10 08:07:17 localhost kernel: lowmem_reserve[]: 0 0 0 0 0

Next is a buddyinfo-like breakdown of memory fragmentation.

Aug 10 08:07:17 localhost kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
Aug 10 08:07:17 localhost kernel: Node 0 DMA32: 834*4kB (UME) 13*8kB (UME) 632*16kB (UME) 1661*32kB (UME) 298*64kB (UME) 64*128kB (ME) 37*256kB (UM) 20*512kB (UM) 9*1024kB (ME) 0*2048kB 0*4096kB = 122896kB
Aug 10 08:07:17 localhost kernel: Node 0 Normal: 3557*4kB (UM) 73*8kB (ME) 48*16kB (UME) 28*32kB (UM) 31*64kB (UM) 17*128kB (UM) 38*256kB (ME) 23*512kB (M) 19*1024kB (M) 0*2048kB 0*4096kB = 61596kB

Huge page information (available only on RHEL 7 and later):

Aug 10 08:07:17 localhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

Information on page cache and swap:

Aug 10 08:07:17 localhost kernel: 37574 total pagecache pages
Aug 10 08:07:17 localhost kernel: 0 pages in swap cache
Aug 10 08:07:17 localhost kernel: Swap cache stats: add 0, delete 0, find 0/0
Aug 10 08:07:17 localhost kernel: Free swap  = 0kB
Aug 10 08:07:17 localhost kernel: Total swap = 0kB

The amount of RAM on the system is listed. Note the value is in pages.

Aug 10 08:07:17 localhost kernel: 8388494 pages RAM

Some information about reserved memory:

Aug 10 08:07:17 localhost kernel: 0 pages HighMem/MovableOnly
Aug 10 08:07:17 localhost kernel: 154438 pages reserved
Aug 10 08:07:17 localhost kernel: 0 pages hwpoisoned

Available in RHEL 8 and later, if unreclaimable slab memory is greater than all user memory, Unreclaimable slab info is printed (list truncated for the purposes of this article):

Aug 10 08:07:17 localhost kernel: Unreclaimable slab info:
Aug 10 08:07:17 localhost kernel: Name                      Used          Total
Aug 10 08:07:17 localhost kernel: rbd_obj_request           15KB         15KB
Aug 10 08:07:17 localhost kernel: ceph_osd_request         123KB        123KB
Aug 10 08:07:17 localhost kernel: sw_flow              2233895KB    2233895KB
Aug 10 08:07:17 localhost kernel: nf_conncount_rb           15KB         15KB
Aug 10 08:07:17 localhost kernel: nf_conntrack             609KB       1781KB
[...]

A list of all processes is printed, including information in their rss(The values here are in pages) and oom_score_adj (list truncated for the purposes of this article):

Aug 10 08:07:17 localhost kernel: [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Aug 10 08:07:17 localhost kernel: [ 1553]     0  1553   112686    11651   901120        0             0 systemd-journal
Aug 10 08:07:17 localhost kernel: [ 1588]     0  1588    26474      985   241664        0         -1000 systemd-udevd
Aug 10 08:07:17 localhost kernel: [ 2553]     0  2553    19168      187   172032        0         -1000 auditd
Aug 10 08:07:17 localhost kernel: [ 2605]     0  2605    29225      467   208896        0             0 VGAuthService
Aug 10 08:07:17 localhost kernel: [ 2606]     0  2606    41968      569   323584        0             0 vmtoolsd
[...]

Finally, some information on the process that the OOM-killer chose to kill is printed. The process's RSS can be determined by adding anon-rss, file-rss, and shmem-rss from the first line below (total-vm should not be included in this calculation).

Aug 10 08:07:17 localhost kernel: Out of memory: Killed process 1614847 (olm) total-vm:1041276kB, anon-rss:698716kB, file-rss:0kB, shmem-rss:0kB
Aug 10 08:07:17 localhost kernel: oom_reaper: reaped process 1614847 (olm), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Root Cause

The information in an OOM-killer message is often enough to diagnose the root cause of the OOM kill, barring edge cases like kernel memory leaks or hypervisor ballooning.

If you need assistance with an out of memory event please open a case with Red Hat support.

SBR
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.