[RFC 0/9] mce recovery for Sandy Bridge server

From: Luck, Tony
Date: Mon May 23 2011 - 17:54:35 EST


Here's a nine-part patch series to implement "AR=1" recovery
that will be available on high-end Sandy Bridge server processors.
In this case the process detects an uncorrectable memory error
while doing an instruction of data fetch that is about to be
consumed. This is in contrast to the recoverable errors on
Nehalem and Westmere that were out of immediate execution context
(patrol scrubber and cache line write-back).

The code is based on work done by Andi last year and published in
the "mce/action-required" branch of his mce git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6.git
Thus he gets author credit on 6 out of 9 patches (but I'll take
the blame for all of them).

The first eight patches are mostly cleanups and minor new bits
that are needed by part 9 where the interesting stuff happens.

For the "in context" case, we must not return from the machine
check handler (in the data fetch case we'd re-execute the fetch
and take another machine check, in the instruction fetch case
we actually don't have a precise IP to return to). We use the
TIF_MCE_NOTIFY task flag bit to ensure that we don't return to
the user context - but we also need to keep track of the memory
address where the fault occurred. The h/w only gives us the physical
address which we must keep track of ... to do so we have added
"mce_error_pfn" to the task structure - this feels odd, but it
is an attribute of the task (e.g. this task may be migrated to
another processor before we get to look at TIF_MCE_NOTIFY and
head to do_notify_resume() to process it).

Andi's recovery code can also handle a few cases where the
error is detected while running kernel code (when copying
data to/from a user process) - but the TIF_MCE_NOTIFY method
doesn't actually ever get to this code (since the entry_64.S code
only checks TIF_MCE_NOTIFY on return to userspace). I'd
appreciate any ideas on how to handle this. Perhaps we could
do good things when CONFIG_PREEMPT=y (it seems probable that
any error in a non-preemtible section of kernel code is going
to be fatal).

-Tony

arch/x86/include/asm/mce.h | 3 +-
arch/x86/kernel/cpu/mcheck/mce-severity.c | 37 +++-
arch/x86/kernel/cpu/mcheck/mce.c | 286 ++++++++++++++++++++++++-----
arch/x86/kernel/signal.c | 2 +-
include/linux/init_task.h | 7 +
include/linux/sched.h | 3 +
mm/memory-failure.c | 28 ++--
7 files changed, 300 insertions(+), 66 deletions(-)

Andi Kleen (6):
MCE: Always retrieve mce rip before calling no_way_out
MCE: Move ADDR/MISC reading code into common function
MCE: Mask out address mask bits below address granuality
HWPOISON: Handle hwpoison in current process
MCE: Pass registers to work handlers
MCE: Add Action-Required support

Tony Luck (3):
mce: fixes for mce severity table
mce: save most severe error information
mce: run through processors with more severe problems first

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/