Re: [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7)

From: Matthew Wilcox
Date: Sat Jul 19 2008 - 08:13:43 EST


On Sat, Jul 19, 2008 at 12:37:11PM +0200, Andi Kleen wrote:
> Russ Anderson <rja@xxxxxxx> writes:
>
> > [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7)
>
> FWIW I discussed this with some hardware people and the general
> opinion was that it was way too aggressive to disable a page on the
> first corrected error like this patchkit currently does.

I think it's reasonable to take a page out of service on the first error.
Then a user program needs to be notified of which bit is suspected.
It can then subject that page to an intense set of tests (I'd start
by stealing the ones from memtest86+) and if no more errors are found,
it could return the page to service.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/