Re: [PATCHv12 2/4] zbud: add to mm/

From: Seth Jennings
Date: Thu May 30 2013 - 13:44:04 EST


On Wed, May 29, 2013 at 02:29:04PM -0700, Andrew Morton wrote:
> On Wed, 29 May 2013 14:09:02 -0700 (PDT) Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> wrote:
>
> > > memory_failure() is merely an example of a general problem: code which
> > > reads from the memmap[] array and expects its elements to be of type
> > > `struct page'. Other examples might be memory hotplugging, memory leak
> > > checkers etc. I have vague memories of out-of-tree patches
> > > (bigphysarea?) doing this as well.
> > >
> > > It's a general problem to which we need a general solution.
> >
> > <Obi-tmem Kenobe slowly materializes... "use the force, Luke!">
> >
> > One could reasonably argue that any code that makes incorrect
> > assumptions about the contents of a struct page structure is buggy
> > and should be fixed.
>
> Well it has type "struct page" and all code has a right to expect the
> contents to match that type.
>
> > Isn't the "general solution" already described
> > in the following comment, excerpted from include/linux/mm.h, which
> > implies that "scribbling on existing pageframes" [carefully], is fine?
> > (And, if not, shouldn't that comment be fixed, or am I misreading
> > it?)
> >
> > <start excerpt>
> > * For the non-reserved pages, page_count(page) denotes a reference count.
> > * page_count() == 0 means the page is free. page->lru is then used for
> > * freelist management in the buddy allocator.
> > * page_count() > 0 means the page has been allocated.
>
> Well kinda maybe. How all the random memmap-peekers handle this I do
> not know. Setting PageReserved is a big hammer which should keep other
> little paws out of there, although I guess it's abusive of whatever
> PageReserved is supposed to mean.
>
> It's what we used to call a variant record. The tag is page.flags and
> the protocol is, umm,
>
> PageReserved: doesn't refer to a page at all - don't touch
> PageSlab: belongs to slab or slub
> !PageSlab: regular kernel/user/pagecache page

In the !PageSlab case, the page _count has to be considered to determine if the
page is a free page or if it is an allocated non-slab page.

So looking at the fields that need to remained untouched in the struct page for
the memmap-peekers, they are
- page->flags
- page->_count

Is this correct?

>
> Are there any more?
>
> So what to do here? How about
>
> - Position the zbud fields within struct page via the preferred
> means: editing its definition.
>
> - Decide upon and document the means by which the zbud variant is tagged

I'm not sure if there is going to be a way to tag zbud pages in particular
without using a page flag. However, if we can tag it as a non-slab allocated
kernel page with no userspace mappings, that could be sufficient. I think this
can be done with:

!PageSlab(p) && page_count(p) > 0 && page_mapcount(p) <= 0

An alternative is to set PG_slab for zbud pages then we get all the same
treatment as slab pages, which is basically what we want. Setting PG_slab
also conveys that no assumption can be made about the contents of _mapcount.

However, a memmap-peeker could call slab functions on the page which obviously
won't be under the control of the slab allocator. Afaict though, it doesn't
seem that any of them do this since there aren't any functions in the slab
allocator API that take raw struct pages. The worst I've seen is calling
shrink_slab in an effort to get the slab allocator to free up the page.

In summary, I think that maintaining a positive page->_count and setting
PG_slab on zbud pages should provide safety against existing memmap-peekers.

Do you agree?

Seth

>
> - Demonstrate how this is safe against existing memmap-peekers
>
> - Do all this without consuming another page flag :)
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/