Re: [RFC 0/9] mce recovery for Sandy Bridge server

From: Hidetoshi Seto
Date: Wed May 25 2011 - 02:03:57 EST


(2011/05/24 6:54), Luck, Tony wrote:
> Andi's recovery code can also handle a few cases where the
> error is detected while running kernel code (when copying
> data to/from a user process) - but the TIF_MCE_NOTIFY method
> doesn't actually ever get to this code (since the entry_64.S code
> only checks TIF_MCE_NOTIFY on return to userspace). I'd
> appreciate any ideas on how to handle this. Perhaps we could
> do good things when CONFIG_PREEMPT=y (it seems probable that
> any error in a non-preemtible section of kernel code is going
> to be fatal).

How about separating stuffs in:
step1) Add support for AR in user space :
- send sigbus to affected processes, poison affected memory
- panic if error is in kernel
step2) Add support for AR in kernel
- some new notify/handle mechanism etc.

It seems too big jump for me.


Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/