How to find out if Vmware's Ballooning drivers are consuming memory ?
Environment
- Red Hat Enterprise Linux 5
- Vmware Ballooning driver
Issue
- Vmware ballooning drivers are consuming memory but not able to prove it.
Resolution
-
From vmcore analysis:
-
In short, after reviewing mm/slab.c, The conclusion is, the caller of __alloc_pages() (low-level page allocator) is responsible for maintaining statistics on the number of pages allocated to itself. For example, the Slab sub-system does this.
-
Let's take a look at the back trace of the vmmemctl kernel process. Notice the call to
the __alloc_pages() routine:
crash> bt ffff8101fb5a37a0
PID: 2789 TASK: ffff8101fb5a37a0 CPU: 2 COMMAND: "vmmemctl"
#0 [ffff8101fb467980] schedule at ffffffff80062fa0
#1 [ffff8101fb467a58] __cond_resched at ffffffff8009023a
#2 [ffff8101fb467a68] cond_resched at ffffffff800630d5
#3 [ffff8101fb467a78] shrink_inactive_list at ffffffff800cd951
#4 [ffff8101fb467c68] shrink_zone at ffffffff800132bc
#5 [ffff8101fb467ca8] try_to_free_pages at ffffffff800ce8ba
#6 [ffff8101fb467d38] __alloc_pages at ffffffff8000f5f6
#7 [ffff8101fb467da8] alloc_page_interleave at ffffffff800238e7
#8 [ffff8101fb467dc8] OS_ReservedPageAlloc at ffffffff88413584 [vmmemctl]
#9 [ffff8101fb467dd8] Balloon_QueryAndExecute at ffffffff88413b88 [vmmemctl]
#10 [ffff8101fb467e58] Balloon_QueryAndExecute at ffffffff88413a67 [vmmemctl]
#11 [ffff8101fb467e68] OS_Yield at ffffffff88413696 [vmmemctl]
#12 [ffff8101fb467ee8] kthread at ffffffff80032755
#13 [ffff8101fb467f48] kernel_thread at ffffffff8005dfb1
-
Notice that pages are being requested from the Normal zone:
- The contents of register rsi is populated with register rbx, which contains the pointer to the zone object
- Register rbx is "saved" on the stack at 0xffff8101fb467c70 [*]
[*]: This register was not manipulated between
'try_to_free_pages+0x186' and 'shrink_zone+0xf'
// rsi = rbx (zone)
0xffffffff800ce8a8 <try_to_free_pages+0x179>: mov %rbx,%rsi
... // shrink_zone()
0xffffffff800ce8b5 <try_to_free_pages+0x186>: callq 0xffffffff80013195
...
// save rbx on stack
0xffffffff800131a4 <shrink_zone+0xf>: push %rbx
zone
ffff8101fb467c70: 0000000c000dffce ffff81000001a600
^^^^^^^^^^^^^^^^
ffff8101fb467c80: 000000000000000c 0000000000000000
ffff8101fb467c90: 000000000000000c ffff8101fb467d60
ffff8101fb467ca0: 0000000000000000 ffffffff800ce8ba
crash> p -x ((struct zone *)0xffff81000001a600)->name
$2 = 0xffffffff802bca0f "Normal"
= 270 tasks are in page frame reclaim code and vmmemctl is one of them via the direct reclaim path, as indicated by the tsk->flags field whereby PF_MEMALLOC is set. Also note that try_to_free_pages() is called directly by the 'vmemctl'. This function is only called when there is a serious problem with available free memory pages.
crash> p -d ((struct zone *)0xffff81000001a600)->reclaim_in_progress
$3 = {
counter = 270
}
crash> p -x ((struct task_struct *)0xffff8101fb5a37a0)->flags
$4 = 0x10000840 = (PF_MEMALLOC|PF_MEMPOLICY|PF_FORKNOEXEC)
- So does the kernel account for memory used by a particular kernel module that calls __alloc_pages() implicitly?
The answer is no, it does not. In fact the caller needs to maintain these statistics. The __alloc_pages() has no concept of what the pages are being requested for and there is no argument to __alloc_pages() that would tell it anything about the use of the pages.
-
What is known that VMware's 'vmmemctl' does behave in this manner ("balloon") by design and is quite possibly the culprit.
-
Look at the Slab allocator for instance, it is seen that kmem_getpages() does maintain its own stats via add_zone_page_state(page_zone(page), NR_SLAB, nr_pages) when it calls __alloc_pages() (via alloc_pages_node()) on success:
static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
struct page *page;
int nr_pages;
int i;
#ifndef CONFIG_MMU
/*
* Nommu uses slab's for process anonymous memory allocations, and thus
* requires __GFP_COMP to properly refcount higher order allocations
*/
flags |= __GFP_COMP;
#endif
flags |= cachep->gfpflags;
page = alloc_pages_node(nodeid, flags, cachep->gfporder);
if (!page)
return NULL;
nr_pages = (1 << cachep->gfporder);
if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
atomic_add(nr_pages, &slab_reclaim_pages);
add_zone_page_state(page_zone(page), NR_SLAB, nr_pages);
for (i = 0; i < nr_pages; i++)
__SetPageSlab(page + i);
return page_address(page);
}
The code follow is similar to:
-> kmem_getpages()
-> alloc_pages_node()
-> __alloc_pages()
<- __alloc_pages()
<- alloc_pages_node()
-> add_zone_page_state()
crash> kmem -V
VM_STAT:
NR_ANON_PAGES: 724169
NR_FILE_MAPPED: 838
NR_FILE_PAGES: 204681
NR_SLAB: 15177
^^^^^
NR_PAGETABLE: 32088
NR_FILE_DIRTY: 0
NR_WRITEBACK: 0
NR_UNSTABLE_NFS: 0
NR_BOUNCE: 0
NUMA_HIT: 21025628202
NUMA_MISS: 0
NUMA_FOREIGN: 0
NUMA_INTERLEAVE_HIT: 966444526
NUMA_LOCAL: 21025628202
NUMA_OTHER: 0
- So the kernel does not have any mechanism to find out the memory requested by alloc_pages() routine and vmmemctl is using alloc_pages() routine, also note that this is not a limitation of kernel infact this is the way it works. Its callers responsibility to maintain such statics like how many pages has been used etc.
- Disabling the Ballooning drivers can prove that the unaccounted memory is utilized by Ballooning drivers.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.