Re: How to handle a hugepage with bad physical memory?

From: Mel Gorman
Date: Wed Nov 16 2005 - 16:09:23 EST


On Wed, 16 Nov 2005, Robin Holt wrote:

> Russ Anderson recently introduced a patch into ia64 that changes MCA
> behavior. When the MCA is caused by a user reference to a users memory,
> we put an extra reference on the page and kill the user. This leaves
> the working memory available for other jobs while causing a leak of the
> bad page.
>
> I don't know if Russ has done any testing with hugetlbfs pages. I preface
> the remainder of my comments with a huge "I don't know anything"
> disclaimer.
>

Right now, I am not much of an improvement.

> With the new hugepages concept, would it be possible to only mark
> the default pagesize portion of a hugepage as bad and then return the
> remainder of the hugepage for normal use?

The process is dead so the mapping is not a concern. But that huge page is
gone and is no longer useful as a huge page. So, no, it cannot be used for
normal use.

What could be done is something like the following;

1. Go to the struct page that represents the start of the huge page
2. Clear all the flags and fields. Set the count to 1
3. Call __free_pages(smallpage, 0)

Do that for every struct page within the huge page except for the one that
the MCA flagged as bad. The side-effect will be that the buddy allocator
will merge them back to the largest possible blocks and place them on the
free lists. That will give you back all the small pages at least.

> What would we basically need
> to do to accomplish this? Are there patches in the community which we
> should wait to see how they progress before we do any work on this front?
>

Not that I am aware of but that does not mean they do not exist.

--
Mel Gorman
Part-time Phd Student Java Applications Developer
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/