Why Kernel-3.10.0-1160.el7 have a bigger memory footprint when compared to kernel-3.10.0-1127.el7?

Solution Verified - Updated

Environment

  • RHEL7.9
  • kernel-3.10.0-1160.el7

Issue

  • Why Kernel-3.10.0-1160.el7 have a bigger memory footprint when memory compared to kernel-3.10.0-1127.el7?
  • why memory allocated by page_cgroup is more than double in RHEL7.9 kernel compared to RHEL7.8
# uname -r 
3.10.0-1127.el7.x86_64

# free -m
              total        used        free      shared  buff/cache   available
Mem:         128677        1246      127029          13         402      126894
Swap:          4095           0        4095

# journalctl -b|grep page_cgroup
Oct 20 01:52:35 HOSTNAME kernel: allocated 536870912 bytes of page_cgroup     <<<<  512M


# uname -r
3.10.0-1160.2.2.el7.x86_64

# free -m
              total        used        free      shared  buff/cache   available
Mem:         128677        4063      124215          13         398      124076
Swap:          4095           0        4095

[root@dell-r430-11 ~]# journalctl -b|grep page_cgroup
Oct 20 00:38:53 HOSTNAME kernel: allocated 3489660928 bytes of page_cgroup     <<<<    3328M

Resolution

  • This is an expected behavior from RHEL7.9 due to introduction of page_owner feature.

The page_owner feature caused a fixed overhead of 88 bytes per memory page in earlier versions of Red Hat Enterprise Linux 7.9. And this regardless if the feature was enabled or disabled. Optimizations were integrated starting with kernel-3.10.0-1160.31.1.el7. From now on, when the page_owner feature is disabled, the overhead is only 16 bytes per memory page.

This will also generate the same overhead for hugepage memory. The kernel always tracks pages in 4kB units (in x86 architectures) and hugepages internally are implemented as a bundle of 4kB frames guaranteed to be physically contiguous.

Prior to kernel-3.10.0-1160.31.1.el7

  • The size of page_cgroup struct is increased by 88 bytes. The previous size was 16 bytes and after introduction of this feature the new size of struct page_cgroup is 104 bytes.
  • There will be a minimal impact on memory consumption of 2.15% (upper bound), given the increase in size of struct page_cgroup, which is 88 bytes per each tracked page (88/4096 = 0.02148).

Starting with kernel-3.10.0-1160.31.1.el7

  • Changes were introduced in kernel-3.10.0-1160.31.1.el7 from RHSA-2021:2314 that reduce the overhead significantly in the default case, with page_owner OFF.
  • The new approach reduces the previous overhead for the feature of 88 bytes per page frame to 32 bytes per page frame, or roughly 0.78% of physical memory on an x86_64 system.

Root Cause

commit da557dd9da7cc753be92af0af623a069132688ea
Author: Rafael Aquini <aquini@redhat.com>
Date:   Sun Apr 19 01:23:58 2020 -0400

    [redhat] redhat: configs: enable CONFIG_PAGE_OWNER
    
    Message-id: <89658c2db126a9605cd57e5fec70478d4f917e4f.1587185767.git.aquini@redhat.com>
    Patchwork-id: 303604
    Patchwork-instance: patchwork
    O-Subject: [RHEL7 BZ#1781726 PATCH 10/11] redhat: configs: enable CONFIG_PAGE_OWNER
    Bugzilla: 1781726
    RH-Acked-by: Don Dutile <ddutile@redhat.com>
    RH-Acked-by: Waiman Long <longman@redhat.com>
    RH-Acked-by: Aristeu Rozanski <aris@redhat.com>
    
    Upstream status: RHEL only
    
    Get CONFIG_PAGE_OWNER enabled on the RHEL kernel builds so we
    provide this debug feature to help on extending our ability to
    bug-chase kernel memory usage by 3rd party drivers.
    
    This will also enable CONFIG_PAGE_EXTENSION, which will cause
    struct page_cgroup to be extended, as well.
    The change is safe, though, as struct page_cgroup is not part of our
    external kABI exports, and although it will end up increased in size,
    however, its layout and current cacheline placement for flags and
    *mem_cgroup elements will remain unchanged, and the extention will not
    cause any further offset drift, as page_cgroup doesn't get embbeded into
    other data structures:
    
     $ pahole -i page_ext vmlinux-3.10.0-1136.el7.pgown.x86_64
     page_cgroup
     $ pahole -i page_cgroup vmlinux-3.10.0-1136.el7.pgown.x86_64
    
     $ pahole -C page_cgroup vmlinux-3.10.0-1136.el7.x86_64
     struct page_cgroup {
            long unsigned int          flags;                /*     0     8 */
            struct mem_cgroup *        mem_cgroup;           /*     8     8 */
    
            /* size: 16, cachelines: 1, members: 2 */
            /* last cacheline: 16 bytes */
     };
     $ pahole -C page_cgroup vmlinux-3.10.0-1136.el7.pgown.x86_64
     struct page_cgroup {
            long unsigned int          flags;                /*     0     8 */
            struct mem_cgroup *        mem_cgroup;           /*     8     8 */
            struct page_ext            ext;                  /*    16    88 */
    
            /* size: 104, cachelines: 2, members: 3 */
            /* last cacheline: 40 bytes */
     };


commit e26a6d2618e71ec69833712fbae3b72f480b634a
Author: Rafael Aquini <aquini@redhat.com>
Date:   Sun Apr 19 01:23:51 2020 -0400

    [mm] mm/page_owner: keep track of page owners
    
    Message-id: <9afbd9229cfb62acb8bbee549fa0a27b8b06e2b5.1587185767.git.aquini@redhat.com>
    Patchwork-id: 303597
    Patchwork-instance: patchwork
    O-Subject: [RHEL7 BZ#1781726 PATCH 03/11] mm/page_owner: keep track of page owners
    Bugzilla: 1781726
    RH-Acked-by: Don Dutile <ddutile@redhat.com>
    RH-Acked-by: Waiman Long <longman@redhat.com>
    RH-Acked-by: Aristeu Rozanski <aris@redhat.com>
    
    This patch is a backport of the following upstream commit:
    commit 48c96a3685795e52903e60c7ee115e5e22e7d640
    Author: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Date:   Fri Dec 12 16:56:01 2014 -0800
    
        mm/page_owner: keep track of page owners
    
        This is the page owner tracking code which is introduced so far ago.  It
        is resident on Andrew's tree, though, nobody tried to upstream so it
        remain as is.  Our company uses this feature actively to debug memory leak
        or to find a memory hogger so I decide to upstream this feature.
    
        This functionality help us to know who allocates the page.  When
        allocating a page, we store some information about allocation in extra
        memory.  Later, if we need to know status of all pages, we can get and
        analyze it from this stored information.
    
        In previous version of this feature, extra memory is statically defined in
        struct page, but, in this version, extra memory is allocated outside of
        struct page.  It enables us to turn on/off this feature at boottime
        without considerable memory waste.
    
        Although we already have tracepoint for tracing page allocation/free,
        using it to analyze page owner is rather complex.  We need to enlarge the
        trace buffer for preventing overlapping until userspace program launched.
        And, launched program continually dump out the trace buffer for later
        analysis and it would change system behaviour with more possibility rather
        than just keeping it in memory, so bad for debug.
    
        Moreover, we can use page_owner feature further for various purposes.  For
        example, we can use it for fragmentation statistics implemented in this
        patch.  And, I also plan to implement some CMA failure debugging feature
        using this interface.
    
        I'd like to give the credit for all developers contributed this feature,
        but, it's not easy because I don't know exact history.  Sorry about that.
        Below is people who has "Signed-off-by" in the patches in Andrew's tree.

Why are hugepages affected, as well?

This will also generate the same overhead for hugepage memory. hugepages are a run-time performance feature, but not a memory-reduction feature. On all architectures, page-overhead is based on the primary page size, which is 4K for x86 systems. Hugepages can be broken down into smaller page-size elements. For example, a VM that is configured with hugepages is migrated in chunks of the primary page size (4K). Otherwise, it would be unlikely the VM would ever converge on a migration. There are other cases that hugepages are broken down and/or managed into/by the smaller, core-architecture page-size.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.