Re: [PATCH] hugetlb: Fix pool resizing corner case

From: Dave Hansen
Date: Wed Oct 03 2007 - 14:59:42 EST


On Wed, 2007-10-03 at 13:33 -0500, Adam Litke wrote:
> On Wed, 2007-10-03 at 10:40 -0700, Dave Hansen wrote:
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 84c795e..7af3908 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -224,14 +224,14 @@ static void try_to_free_low(unsigned long count)
> > > for (i = 0; i < MAX_NUMNODES; ++i) {
> > > struct page *page, *next;
> > > list_for_each_entry_safe(page, next, &hugepage_freelists[i], lru) {
> > > + if (count >= nr_huge_pages)
> > > + return;
> > > if (PageHighMem(page))
> > > continue;
> > > list_del(&page->lru);
> > > update_and_free_page(page);
> > > free_huge_pages--;
> > > free_huge_pages_node[page_to_nid(page)]--;
> > > - if (count >= nr_huge_pages)
> > > - return;
> > > }
> > > }
> > > }
> >
> > That's an excellent problem description. I'm just a bit hazy on how the
> > patch fixes it. :)
> >
> > What is the actual error in this loop? The fact that we can go trying
> > to free pages when the count is actually OK?
>
> The above hunk serves only to change the behavior of try_to_free_low()
> so that rather than always freeing _at_least_ one huge page, it can
> return without having freed any pages.

OK, that makes sense. Can you include that in the patch description?

> > BTW, try_to_free_low(count) kinda sucks for a function name. Is that
> > count the number of pages we're trying to end up with, or the total
> > number of low pages that we're trying to free?
>
> I agree the name sucks, but this is a bugfix patch.

Oh, yeah, none of what I was suggesting belongs in this patch. But,
they'd make good followups. I was just trying to get you to look at
what caused that bug to be introduced in the first place.

> > We could rewrite the original max() line this way:
> >
> > if (resv_huge_pages > nr_of_pages_to_end_up_with)
> > nr_of_pages_to_end_up_with = resv_huge_pages;
> > try_to_make_the_total_nr_of_hpages(nr_of_pages_to_end_up_with);
> >
> > Which makes it more clear that you're setting the number of total pages
> > to the number of reserved pages, which is obviously screwy.
> >
> > OK, so this is actually saying: "count can never go below
> > resv_huge_pages+nr_huge_pages"?
>
> Not quite. Count can never go below the number of reserved pages plus
> pages allocated to MAP_PRIVATE mappings. That number is computed by:
> (resv + (total - free)).

So, (total - free) equals the number of MAP_PRIVATE pages? Does that
imply that all reserved pages are shared and that all shared pages are
reserved?

This would be clearer even with

static inline int nr_map_private_hpages(void)
{
return total - free;
}

Especially, if we could get the "shared" name somewhere into the
reserved variable.

It makes a whole ton of sense to see:

int total_in_use = nr_shared() + nr_private();
if (nr_requested_pages < total_in_use)
uh_oh_cant_do_that();

That'd be a wonderful cleanup.

> > Could we change try_to_free_low() to free a distinct number of pages?
> >
> > if (count > free_huge_pages)
> > count = free_huge_pages;
> > try_to_free_nr_huge_pages(count);
> >
> > I feel a bit sketchy about the "resv_huge_pages + nr_huge_pages -
> > free_huge_pages" logic. Could you elaborate a bit there on what the
> > rules are?
>
> The key is that we don't want to shrink the pool below the number of
> pages we are committed to keeping around. Before this patch, we only
> accounted for the pages we plan to hand out (reserved huge pages) but
> not the ones we've already handed out (total - free). Does that make
> sense?

Yes. I'm just starting to get the idea that pages are committed in
different ways. So, it makes good sense to have a nice function like
total_committed_hpages(), which explains all of this, and is called to
compare agaist the evil "count" variable. :)

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/