Re: [PATCH] MAINTAINERS: Add x86 RAS people

From: Borislav Petkov
Date: Fri Jun 17 2011 - 09:51:26 EST


On Fri, Jun 17, 2011 at 09:27:58AM -0400, Pavel Machek wrote:
> On Tue 2011-06-14 18:08:54, Borislav Petkov wrote:
> > Announce the new RAS infrastructure maintainers. The file patterns below
> > will change after we start the restructuring.
> >
> > Signed-off-by: Borislav Petkov <bp@xxxxxxxxx>
> > Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> >
> > +X86 RAS INFRASTRUCTURE
> > +M: Tony Luck <tony.luck@xxxxxxxxx>
>
> this would be great place to explain "ras"...

Wikipedia has a basic overview: http://en.wikipedia.org/wiki/Reliability,_Availability_and_Serviceability

Our idea is to make error collection and reporting much more easy to
configure and much easily manageable now that reliability features are
much more important on x86. You want to be able to enforce policies
from userspace like, for example, counting errors per hw device (DRAM
ECC errors per DIMM, for example) and undertake actions when thresholds
are reached, implement a much better unified error injection scheme for
testing system reliability, etc.

Another important issue is saving oops information to persistent storage
so that it can be evaluated after reboot. While this is easy to do on
servers with their nvram, we still have no solution for general purpose
laptops. hpa had a project with oopses represented with a 2d barcode but
I still haven't had a chance to look into that.

So things like that, I'm pretty sure I'm leaving something out but you
should be getting the idea...

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/