Re: [PATCH 2/2] mm,migration: Fix race between shift_arg_pages andrmap_walk by guaranteeing rmap_walk finds PTEs created within thetemporary stack

From: Mel Gorman
Date: Mon May 10 2010 - 10:03:04 EST


On Mon, May 10, 2010 at 09:42:38AM +0900, KAMEZAWA Hiroyuki wrote:
> On Sun, 9 May 2010 12:56:49 -0700 (PDT)
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> >
> >
> > On Sun, 9 May 2010, Mel Gorman wrote:
> > >
> > > It turns out not to be easy to the preallocating of PUDs, PMDs and PTEs
> > > move_page_tables() needs. To avoid overallocating, it has to follow the same
> > > logic as move_page_tables duplicating some code in the process. The ugliest
> > > aspect of all is passing those pre-allocated pages back into move_page_tables
> > > where they need to be passed down to such functions as __pte_alloc. It turns
> > > extremely messy.
> >
> > Umm. What?
> >
> > That's crazy talk. I'm not talking about preallocating stuff in order to
> > pass it in to move_page_tables(). I'm talking about just _creating_ the
> > dang page tables early - preallocating them IN THE PROCESS VM SPACE.
> >
> > IOW, a patch like this (this is a pseudo-patch, totally untested, won't
> > compile, yadda yadda - you need to actually make the people who call
> > "move_page_tables()" call that prepare function first etc etc)
> >
> > Yeah, if we care about holes in the page tables, we can certainly copy
> > more of the move_page_tables() logic, but it certainly doesn't matter for
> > execve(). This just makes sure that the destination page tables exist
> > first.
> >
> IMHO, I think move_page_tables() itself should be implemented as your patch.
>
> But, move_page_tables()'s failure is not a big problem. At failure,
> exec will abort and no page fault will occur later. What we have to do in
> this migration-patch-series is avoding inconsistent update of sets of
> [page, vma->vm_start, vma->pg_off, ptes] or "dont' migrate pages in exec's
> statk".
>
> Considering cost, as Mel shows, "don't migrate pages in exec's stack" seems
> reasonable. But, I still doubt this check.
>
> +static bool is_vma_temporary_stack(struct vm_area_struct *vma)
> +{
> + int maybe_stack = vma->vm_flags & (VM_GROWSDOWN | VM_GROWSUP);
> +
> + if (!maybe_stack)
> + return false;
> +
> + /* If only the stack is mapped, assume exec is in progress */
> + if (vma->vm_mm->map_count == 1) -------------------(*)
> + return true;
> +
> + return false;
> +}
> +
>
> Mel, can (*) be safe even on a.out format (format other than ELFs) ?
>

I felt it was safe because this happens before search_binary_handler is
called to find a handler to load the binary. Still, the suggestion to
use an impossible combination of VMA flags is more robust against any
future change.

> <SNIP>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/