How to find out if Vmware's Ballooning drivers are consuming memory ?

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 5
  • Vmware Ballooning driver

Issue

  • Vmware ballooning drivers are consuming memory but not able to prove it.

Resolution

  • From vmcore analysis:

  • In short, after reviewing mm/slab.c, The conclusion is, the caller of __alloc_pages() (low-level page allocator) is responsible for maintaining statistics on the number of pages allocated to itself. For example, the Slab sub-system does this.

  • Let's take a look at the back trace of the vmmemctl kernel process. Notice the call to
    the __alloc_pages() routine:

	crash> bt ffff8101fb5a37a0
	PID: 2789   TASK: ffff8101fb5a37a0  CPU: 2   COMMAND: "vmmemctl"
	 #0 [ffff8101fb467980] schedule at ffffffff80062fa0
	 #1 [ffff8101fb467a58] __cond_resched at ffffffff8009023a
	 #2 [ffff8101fb467a68] cond_resched at ffffffff800630d5
	 #3 [ffff8101fb467a78] shrink_inactive_list at ffffffff800cd951
	 #4 [ffff8101fb467c68] shrink_zone at ffffffff800132bc
	 #5 [ffff8101fb467ca8] try_to_free_pages at ffffffff800ce8ba
	 #6 [ffff8101fb467d38] __alloc_pages at ffffffff8000f5f6
	 #7 [ffff8101fb467da8] alloc_page_interleave at ffffffff800238e7
	 #8 [ffff8101fb467dc8] OS_ReservedPageAlloc at ffffffff88413584 [vmmemctl]
	 #9 [ffff8101fb467dd8] Balloon_QueryAndExecute at ffffffff88413b88 [vmmemctl]
	#10 [ffff8101fb467e58] Balloon_QueryAndExecute at ffffffff88413a67 [vmmemctl]
	#11 [ffff8101fb467e68] OS_Yield at ffffffff88413696 [vmmemctl]
	#12 [ffff8101fb467ee8] kthread at ffffffff80032755
	#13 [ffff8101fb467f48] kernel_thread at ffffffff8005dfb1
  • Notice that pages are being requested from the Normal zone:

    • The contents of register rsi is populated with register rbx, which contains the pointer to the zone object
    • Register rbx is "saved" on the stack at 0xffff8101fb467c70 [*]

    [*]: This register was not manipulated between
    'try_to_free_pages+0x186' and 'shrink_zone+0xf'

							// rsi = rbx (zone)	
	0xffffffff800ce8a8 <try_to_free_pages+0x179>:	mov    %rbx,%rsi
	...						// shrink_zone()
	0xffffffff800ce8b5 <try_to_free_pages+0x186>:	callq  0xffffffff80013195
	...
							// save rbx on stack
	0xffffffff800131a4 <shrink_zone+0xf>:		push   %rbx

						     zone
	    ffff8101fb467c70: 0000000c000dffce ffff81000001a600
	    				       ^^^^^^^^^^^^^^^^ 
	    ffff8101fb467c80: 000000000000000c 0000000000000000 
	    ffff8101fb467c90: 000000000000000c ffff8101fb467d60 
	    ffff8101fb467ca0: 0000000000000000 ffffffff800ce8ba 

	crash> p -x ((struct zone *)0xffff81000001a600)->name
	$2 = 0xffffffff802bca0f "Normal"

= 270 tasks are in page frame reclaim code and vmmemctl is one of them via the direct reclaim path, as indicated by the tsk->flags field whereby PF_MEMALLOC is set. Also note that try_to_free_pages() is called directly by the 'vmemctl'. This function is only called when there is a serious problem with available free memory pages.

	crash> p -d ((struct zone *)0xffff81000001a600)->reclaim_in_progress
	$3 = {
	  counter = 270
	}
	
	crash> p -x ((struct task_struct *)0xffff8101fb5a37a0)->flags
	$4 = 0x10000840 = (PF_MEMALLOC|PF_MEMPOLICY|PF_FORKNOEXEC)
  • So does the kernel account for memory used by a particular kernel module that calls __alloc_pages() implicitly?

The answer is no, it does not. In fact the caller needs to maintain these statistics. The __alloc_pages() has no concept of what the pages are being requested for and there is no argument to __alloc_pages() that would tell it anything about the use of the pages.

  • What is known that VMware's 'vmmemctl' does behave in this manner ("balloon") by design and is quite possibly the culprit.

  • Look at the Slab allocator for instance, it is seen that kmem_getpages() does maintain its own stats via add_zone_page_state(page_zone(page), NR_SLAB, nr_pages) when it calls __alloc_pages() (via alloc_pages_node()) on success:

	static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
	{
	        struct page *page;
	        int nr_pages;
	        int i;
	
	#ifndef CONFIG_MMU
	        /*   
	         * Nommu uses slab's for process anonymous memory allocations, and thus
	         * requires __GFP_COMP to properly refcount higher order allocations
	         */
	        flags |= __GFP_COMP;
	#endif
	        flags |= cachep->gfpflags;
	
	        page = alloc_pages_node(nodeid, flags, cachep->gfporder);
	        if (!page)
	                return NULL;
	
	        nr_pages = (1 << cachep->gfporder);
	        if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
	                atomic_add(nr_pages, &slab_reclaim_pages);
	        add_zone_page_state(page_zone(page), NR_SLAB, nr_pages);
	        for (i = 0; i < nr_pages; i++) 
	                __SetPageSlab(page + i);
	        return page_address(page);
	}

The code follow is similar to:

	-> kmem_getpages()
	  -> alloc_pages_node()
	    -> 	__alloc_pages()
	    <- __alloc_pages()
	  <- alloc_pages_node()
	  -> add_zone_page_state()

crash> kmem -V 
  VM_STAT:
          NR_ANON_PAGES: 724169
         NR_FILE_MAPPED: 838
          NR_FILE_PAGES: 204681
          
                NR_SLAB: 15177
                	 ^^^^^
           NR_PAGETABLE: 32088
          NR_FILE_DIRTY: 0
           NR_WRITEBACK: 0
        NR_UNSTABLE_NFS: 0
              NR_BOUNCE: 0
               NUMA_HIT: 21025628202
              NUMA_MISS: 0
           NUMA_FOREIGN: 0
    NUMA_INTERLEAVE_HIT: 966444526
             NUMA_LOCAL: 21025628202
             NUMA_OTHER: 0
  • So the kernel does not have any mechanism to find out the memory requested by alloc_pages() routine and vmmemctl is using alloc_pages() routine, also note that this is not a limitation of kernel infact this is the way it works. Its callers responsibility to maintain such statics like how many pages has been used etc.
  • Disabling the Ballooning drivers can prove that the unaccounted memory is utilized by Ballooning drivers.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.