Re: Nick's core remove PageReserved broke vmware...

From: Petr Vandrovec
Date: Wed Nov 02 2005 - 13:07:10 EST


Hugh Dickins wrote:
On Wed, 2 Nov 2005, Petr Vandrovec wrote:

So we just marked vma VM_RESERVED, as it
did not hurt, and all pages in this vma have refcount > 1 anyway so there is
no point in trying to cleanup these page tables. Now rmap catches this by
page_count() != page_mapcount(), so VM_RESERVED is not needed anymore, but
there did not seem to be any reason to remove it.


Well, beware. That "page_count() != page_mapcount() + 2" check in rmap.c
went away in 2.6.13: the problem it was there to solve being solved
instead by a can_share_swap_page based on page_mapcount instead of
page_count (partly to fix a page migration progress problem).

So, you may still be in trouble? But how do the pages you're concerned
with come to be on the LRU in the first place? If they're not on the
LRU, vmscanning will never try to take them away. Most drivers with
special pages, and ->mapping unset, don't put the pages on the LRU.

No, we do not put pages on LRU. Older kernels were removing once
instantiated page entries from pagetables. It did not cause any
problems, but as VM cannot gain anything from this removal, we were
happy to find that VM did not clear page tables for VM_RESERVED vmas.
But primary motivation for VM_RESERVED was to get rid warning on SuSE
2.6.4. As one VMware instance uses 4 to 6 pages allocated this way,
it is really not big problem even if VM will try to remove them from
pagetables...

--- vmmon-only/linux/driver.c.orig 2005-11-02 02:00:46.000000000 +0100
+++ vmmon-only/linux/driver.c 2005-11-01 20:12:13.000000000 +0100
@@ -1283,9 +1283,13 @@
/*
* It seems that SuSE's 2.6.4-52 needs this. Hopefully
* it will not break anything else.
+ *
+ * It breaks on post 2.6.14 kernels, so get rid of it on them.
*/
#ifdef VM_RESERVED
+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 14)
vma->vm_flags |= VM_RESERVED;
+# endif
#endif
return 0;
}


Nick's PageReserved/VM_RESERVED changes are not in 2.6.14 so I'd expect
2.6.15 there. Ah, you're trying to handle this awkward interval before
2.6.15-rc1 brings the numbering up to 2.6.15, okay.

For our purpose anything between 2.6.6 and 2.6.14 is fine as for us.
Allocated pages are normal memory pages, they just were allocated to
meet specific criteria - being allocated either from low 4GB (accessible in
32bit mode without paging), or being physically contiguous, so they all
have 'struct page' and can be happily refcounted.

Only really obscure thing we do is that we allocate order 1 or 2 (8/16KB)
regions, bump refcount by 1 on all pages except first one (if CONFIG_MMU
is set), and then pass independent pages from this region through nopage
handler to the VM to map them to the user level. It means that region
is allocated by alloc_pages(GFP_USER, 2), while it is released by 4 calls
to put_page(), for page+0...page+3.

It works on all Linux kernels with MMU I've tested, and it would be nice
if this could continue to work. Current kernels seems to provide PageCompound,
which would fit our needs as well after some changes, but as long as
PageCompound is build time option it is not possible to rely on it.

Best regards,
Petr Vandrovec

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/