Re: [Lse-tech] Re: Hugetlbpages in very large memorymachines.......

From: Andrew Morton
Date: Sun Mar 14 2004 - 03:58:58 EST


Ray Bryant <raybry@xxxxxxx> wrote:
>
>
> I agree with the compatibility concern, but the other part of the problem
> is that while hugetlb_prefault() is running, it holds both the mm->mmap_sem in
> write mode and the mm->page_table_lock. So not only does it take 500 s for
> the mmap() to return on our test system, but ps, top, etc all freeze for the
> duration. Very irritating, especially on a 64 or 128 P system.

Well that's just a dumb implementation. hugetlb_prefault() doesn't need
page_table_lock while it is zeroing the page: just drop it, test for
-EEXIST returned from add_to_page_cache().

In fact we need to do that anyway: the current code is buggy if some other
process with a different mm gets in there and instantiates the page in the
pagecache before this process does: hugetlb_prefault() will return -EEXIST
instead of simply accepting the race and using the page which someone else
put there.

After we have the page in pagecache we need to retake page_table_lock and
check that the target pte is still pte_none(). If it is not, you know that
some other thread has already instantiated a pte there so the new ref to
the pagecache page can simply be dropped. See how do_no_page() handles it.
Of course, this only applies if mmap_sem is no longer held in there.

As for holding mmap_sem for too long, well, that can presumably be worked
around by not mmapping the whole lot in one hit?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/