Re: [PATCH 0/2] ia64: Migrate data off physical pages with correctable errors

From: Matthew Wilcox
Date: Mon Apr 28 2008 - 15:34:26 EST


On Mon, Apr 28, 2008 at 02:22:52PM -0500, Russ Anderson wrote:
> There is always an issue of how agressive the code should be on
> migrating pages. Should it migrate on the first correctable error,
> or wait for some threshold? Reasonable people may disagree on the
> threshold and the "right" answer may be hardware specific. The
> decision making is confined to the cpe_migrate.c code. It is
> currently set to migrate on the first correctable error.

I think the kernel code should do the migration ASAP. But I think we
should have a list of 'bad' pages. We could then have a badram driver
that userspace can talk to to find out which pages are bad, map those
pages into a badram process, do various tests on them, and return the
pages to the pool if they're determined to be 'good'.

I could also see badramd having a list of pages found to be bad
in previous boots and asking the badram driver to take them out of
circulation early in boot before they've been allocated.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/