Re: [PATCH 17/18] mm, hugetlb: retry if we fail to allocate ahugepage with use_reserve

From: David Gibson
Date: Sun Aug 04 2013 - 05:48:42 EST

On Wed, Jul 31, 2013 at 02:37:53PM +0900, Joonsoo Kim wrote:
> Hello, David.
> On Mon, Jul 29, 2013 at 05:28:23PM +1000, David Gibson wrote:
> > On Mon, Jul 29, 2013 at 02:32:08PM +0900, Joonsoo Kim wrote:
> > > If parallel fault occur, we can fail to allocate a hugepage,
> > > because many threads dequeue a hugepage to handle a fault of same address.
> > > This makes reserved pool shortage just for a little while and this cause
> > > faulting thread who is ensured to have enough reserved hugepages
> > > to get a SIGBUS signal.
> >
> > It's not just about reserved pages. The same race can happen
> > perfectly well when you're really, truly allocating the last hugepage
> > in the system.
> Yes, you are right.
> This is a critical comment to this patchset :(
> IIUC, the case you mentioned is about tasks which have a mapping

Any mapping that doesn't use the reserved pool, not just
MAP_NORESERVE. For example, if a process makes a MAP_PRIVATE mapping,
then fork()s then the mapping is instantiated in the child, that will
not draw from the reserved pool.

> Should we ensure them to allocate the last hugepage?
> They map a region with MAP_NORESERVE, so don't assume that their requests
> always succeed.

If the pages are available, people get cranky if it fails for no
apparent reason, MAP_NORESERVE or not. They get especially cranky if
it sometimes fails and sometimes doesn't due to a race condition.

