Re: [PATCH 2/2] mm,migration: Fix race between shift_arg_pages andrmap_walk by guaranteeing rmap_walk finds PTEs created within the temporarystack

From: KAMEZAWA Hiroyuki
Date: Sun May 09 2010 - 20:36:57 EST


On Sun, 9 May 2010 20:21:45 +0100
Mel Gorman <mel@xxxxxxxxx> wrote:

> On Thu, May 06, 2010 at 07:12:59PM -0700, Linus Torvalds wrote:
> >
> >
> > On Fri, 7 May 2010, KAMEZAWA Hiroyuki wrote:
> > >
> > > IIUC, move_page_tables() may call "page table allocation" and it cannot be
> > > done under spinlock.
> >
> > Bah. It only does a "alloc_new_pmd()", and we could easily move that out
> > of the loop and pre-allocate the pmd's.
> >
> > If that's the only reason, then it's a really weak one, methinks.
> >
>
> It turns out not to be easy to the preallocating of PUDs, PMDs and PTEs
> move_page_tables() needs. To avoid overallocating, it has to follow the same
> logic as move_page_tables duplicating some code in the process. The ugliest
> aspect of all is passing those pre-allocated pages back into move_page_tables
> where they need to be passed down to such functions as __pte_alloc. It turns
> extremely messy.
>
> I stopped working on it about half way through as it was already too ugly
> to live and would have similar cost to Kamezawa's much more straight-forward
> approach of using move_vma().
>
> While using move_vma is straight-forward and solves the problem, it's
> not as cheap as Andrea's solution. Andrea allocates a temporary VMA and
> puts it on a list and very little else. It didn't show up any problems
> in microbenchmarks. Calling move_vma does a lot more work particularly in
> copy_vma and this slows down exec.
>
> With Kamezawa's patch, kernbench was fine on wall time but in System Time,
> it slowed by up 1.48% in comparison to Andrea's slowing up by 0.64%[1].
>
> aim9 was slowed as well. Kamezawa's slowed by 2.77% where Andrea's reported
> faster by 2.58%. While AIM9 is flaky and these figures are barely outside
> the noise, calling move_vma() is obviously more expensive.
>

Thank you for testing.


> While my solution at http://lkml.org/lkml/2010/4/30/198 is cheapest as it
> does not touch exec() at all, is_vma_temporary_stack() could be broken in
> the future if any of the assumptions it makes change.
>
> So what you have is an inverse relationship between magic and
> performance. Mine has the most magic and is fastest. Kamezawa's has the
> least magic but slowest and Andrea has the goldilocks factor. Which do
> you prefer?
>

I like the fastest one ;)

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/