Re: Hugetlbpages in very large memory machines.......

From: Andrew Morton
Date: Sat Mar 13 2004 - 21:48:06 EST

Next message: Horst von Brand: "Re: finding out the value of HZ from userspace"
Previous message: Ian Kent: "Re: unionfs"
In reply to: Luis Mirabal: "Re[2]: Hugetlbpages in very large memory machines......."
Next in thread: Anton Blanchard: "Re: [Lse-tech] Re: Hugetlbpages in very large memory machines......."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Andi Kleen <ak@xxxxxxx> wrote:
>
> > We've looked at allocating and zeroing hugetlbpages at fault time, which
> > would at least allow multiple processors to be thrown at the problem.
> > Question is, has anyone else been working on
> > this problem and might they have prototype code they could share with us?
>
> Yes. I ran into exactly this problem with NUMA API too.
> mbind() runs after mmap, but it cannot work anymore when
> the pages are already allocated.
>
> I fixed it on x86-64/i386 by allocating the pages lazily.
> Doing it for IA64 has been on the todo list too.
>
> i386/x86-64 Code as an example attached.
>
> One drawback is that the out of memory handling is lot less nicer
> than it was before - when you run out of hugepages you get SIGBUS
> now instead of a ENOMEM from mmap. Maybe some prereservation would
> make sense, but that would be somewhat harder. Alternatively
> fall back to smaller pages if possible (I was told it isn't easily
> possible on IA64)

Demand-paging the hugepages is a decent feature to have, and ISTR resisting
it before for this reason.

Even though it's early in the 2.6 series I'd be a bit worried about
breaking existing hugetlb users in this way. Yes, the pages are
preallocated so it is unlikely that a working setup is suddenly going to
break. Unless someone is using the return value from mmap to find out how
many pages they can get.

So ho-hum. I think it needs to be back-compatible. Could we add
MAP_NO_PREFAULT?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Horst von Brand: "Re: finding out the value of HZ from userspace"
Previous message: Ian Kent: "Re: unionfs"
In reply to: Luis Mirabal: "Re[2]: Hugetlbpages in very large memory machines......."
Next in thread: Anton Blanchard: "Re: [Lse-tech] Re: Hugetlbpages in very large memory machines......."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]