Why Kernel-3.10.0-1160.el7 have a bigger memory footprint when compared to kernel-3.10.0-1127.el7?
Environment
- RHEL7.9
- kernel-3.10.0-1160.el7
Issue
- Why Kernel-3.10.0-1160.el7 have a bigger memory footprint when memory compared to kernel-3.10.0-1127.el7?
- why memory allocated by
page_cgroupis more than double in RHEL7.9 kernel compared to RHEL7.8
# uname -r
3.10.0-1127.el7.x86_64
# free -m
total used free shared buff/cache available
Mem: 128677 1246 127029 13 402 126894
Swap: 4095 0 4095
# journalctl -b|grep page_cgroup
Oct 20 01:52:35 HOSTNAME kernel: allocated 536870912 bytes of page_cgroup <<<< 512M
# uname -r
3.10.0-1160.2.2.el7.x86_64
# free -m
total used free shared buff/cache available
Mem: 128677 4063 124215 13 398 124076
Swap: 4095 0 4095
[root@dell-r430-11 ~]# journalctl -b|grep page_cgroup
Oct 20 00:38:53 HOSTNAME kernel: allocated 3489660928 bytes of page_cgroup <<<< 3328M
Resolution
- This is an expected behavior from RHEL7.9 due to introduction of page_owner feature.
The page_owner feature caused a fixed overhead of 88 bytes per memory page in earlier versions of Red Hat Enterprise Linux 7.9. And this regardless if the feature was enabled or disabled. Optimizations were integrated starting with kernel-3.10.0-1160.31.1.el7. From now on, when the page_owner feature is disabled, the overhead is only 16 bytes per memory page.
This will also generate the same overhead for hugepage memory. The kernel always tracks pages in 4kB units (in x86 architectures) and hugepages internally are implemented as a bundle of 4kB frames guaranteed to be physically contiguous.
Prior to kernel-3.10.0-1160.31.1.el7
- The size of page_cgroup struct is increased by 88 bytes. The previous size was 16 bytes and after introduction of this feature the new size of struct page_cgroup is 104 bytes.
- There will be a minimal impact on memory consumption of 2.15% (upper bound), given the increase in size of struct page_cgroup, which is 88 bytes per each tracked page (88/4096 = 0.02148).
Starting with kernel-3.10.0-1160.31.1.el7
- Changes were introduced in
kernel-3.10.0-1160.31.1.el7from RHSA-2021:2314 that reduce the overhead significantly in the default case, with page_owner OFF. - The new approach reduces the previous overhead for the feature of 88 bytes per page frame to 32 bytes per page frame, or roughly 0.78% of physical memory on an x86_64 system.
Root Cause
- The RFE was backported in RHEL7.9 via This content is not included.bugid 1781726.
- Following commits were backported
commit da557dd9da7cc753be92af0af623a069132688ea
Author: Rafael Aquini <aquini@redhat.com>
Date: Sun Apr 19 01:23:58 2020 -0400
[redhat] redhat: configs: enable CONFIG_PAGE_OWNER
Message-id: <89658c2db126a9605cd57e5fec70478d4f917e4f.1587185767.git.aquini@redhat.com>
Patchwork-id: 303604
Patchwork-instance: patchwork
O-Subject: [RHEL7 BZ#1781726 PATCH 10/11] redhat: configs: enable CONFIG_PAGE_OWNER
Bugzilla: 1781726
RH-Acked-by: Don Dutile <ddutile@redhat.com>
RH-Acked-by: Waiman Long <longman@redhat.com>
RH-Acked-by: Aristeu Rozanski <aris@redhat.com>
Upstream status: RHEL only
Get CONFIG_PAGE_OWNER enabled on the RHEL kernel builds so we
provide this debug feature to help on extending our ability to
bug-chase kernel memory usage by 3rd party drivers.
This will also enable CONFIG_PAGE_EXTENSION, which will cause
struct page_cgroup to be extended, as well.
The change is safe, though, as struct page_cgroup is not part of our
external kABI exports, and although it will end up increased in size,
however, its layout and current cacheline placement for flags and
*mem_cgroup elements will remain unchanged, and the extention will not
cause any further offset drift, as page_cgroup doesn't get embbeded into
other data structures:
$ pahole -i page_ext vmlinux-3.10.0-1136.el7.pgown.x86_64
page_cgroup
$ pahole -i page_cgroup vmlinux-3.10.0-1136.el7.pgown.x86_64
$ pahole -C page_cgroup vmlinux-3.10.0-1136.el7.x86_64
struct page_cgroup {
long unsigned int flags; /* 0 8 */
struct mem_cgroup * mem_cgroup; /* 8 8 */
/* size: 16, cachelines: 1, members: 2 */
/* last cacheline: 16 bytes */
};
$ pahole -C page_cgroup vmlinux-3.10.0-1136.el7.pgown.x86_64
struct page_cgroup {
long unsigned int flags; /* 0 8 */
struct mem_cgroup * mem_cgroup; /* 8 8 */
struct page_ext ext; /* 16 88 */
/* size: 104, cachelines: 2, members: 3 */
/* last cacheline: 40 bytes */
};
commit e26a6d2618e71ec69833712fbae3b72f480b634a
Author: Rafael Aquini <aquini@redhat.com>
Date: Sun Apr 19 01:23:51 2020 -0400
[mm] mm/page_owner: keep track of page owners
Message-id: <9afbd9229cfb62acb8bbee549fa0a27b8b06e2b5.1587185767.git.aquini@redhat.com>
Patchwork-id: 303597
Patchwork-instance: patchwork
O-Subject: [RHEL7 BZ#1781726 PATCH 03/11] mm/page_owner: keep track of page owners
Bugzilla: 1781726
RH-Acked-by: Don Dutile <ddutile@redhat.com>
RH-Acked-by: Waiman Long <longman@redhat.com>
RH-Acked-by: Aristeu Rozanski <aris@redhat.com>
This patch is a backport of the following upstream commit:
commit 48c96a3685795e52903e60c7ee115e5e22e7d640
Author: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date: Fri Dec 12 16:56:01 2014 -0800
mm/page_owner: keep track of page owners
This is the page owner tracking code which is introduced so far ago. It
is resident on Andrew's tree, though, nobody tried to upstream so it
remain as is. Our company uses this feature actively to debug memory leak
or to find a memory hogger so I decide to upstream this feature.
This functionality help us to know who allocates the page. When
allocating a page, we store some information about allocation in extra
memory. Later, if we need to know status of all pages, we can get and
analyze it from this stored information.
In previous version of this feature, extra memory is statically defined in
struct page, but, in this version, extra memory is allocated outside of
struct page. It enables us to turn on/off this feature at boottime
without considerable memory waste.
Although we already have tracepoint for tracing page allocation/free,
using it to analyze page owner is rather complex. We need to enlarge the
trace buffer for preventing overlapping until userspace program launched.
And, launched program continually dump out the trace buffer for later
analysis and it would change system behaviour with more possibility rather
than just keeping it in memory, so bad for debug.
Moreover, we can use page_owner feature further for various purposes. For
example, we can use it for fragmentation statistics implemented in this
patch. And, I also plan to implement some CMA failure debugging feature
using this interface.
I'd like to give the credit for all developers contributed this feature,
but, it's not easy because I don't know exact history. Sorry about that.
Below is people who has "Signed-off-by" in the patches in Andrew's tree.
- Improvements to the feature were made in This content is not included.This content is not included.https://bugzilla.redhat.com/show_bug.cgi?id=1948451
Why are hugepages affected, as well?
This will also generate the same overhead for hugepage memory. hugepages are a run-time performance feature, but not a memory-reduction feature. On all architectures, page-overhead is based on the primary page size, which is 4K for x86 systems. Hugepages can be broken down into smaller page-size elements. For example, a VM that is configured with hugepages is migrated in chunks of the primary page size (4K). Otherwise, it would be unlikely the VM would ever converge on a migration. There are other cases that hugepages are broken down and/or managed into/by the smaller, core-architecture page-size.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.