Re: [NAK] Re: [PATCH -v2 9/9] ACPI, APEI, Generic Hardware ErrorSource POLL/IRQ/NMI notification type support

From: Ingo Molnar
Date: Mon Oct 25 2010 - 08:56:02 EST



* Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:

> On Mon, Oct 25, 2010 at 01:15:30PM +0200, Ingo Molnar wrote:
>
> > > > > einj.c: it's about the 3rd separate 'error injection' concept that got
> > > > > introduced ...
> > > >
> > > > EINJ is a true platform feature, not just software feature. We need to support
> > > > it to debug various hardware error features.
> > >
> > > Also having multiple error injecting interfaces is a good thing.
> >
> > It's never a good thing to have separate, vendor dependent interfaces for what
> > to the user is basically the same conceptual thing!
>
> Perhaps a simple example (simplified, in practice there are more complications)
> makes it more clear:
>
> The memory error handler does different actions depending on what the state the
> page the error is happening on is in.

What you appear to be arguing for is the ability to inject different types of
events.

_OF COURSE_ we want that.

Just like we want to be able to _receive_ multiple types of events from wildly
different hardware and wildly different kernel subsystems ...

Duh.

That desire does not necessiate 'three different injectors' at all. It does not
necessiate multiple incompatible facilities with random ABIs.

What we want is a single injector facility visible to RAS/hw-testing/etc. apps, and
a way to pass in attributes that specify the kind of event that we want to trigger.

Also note that you completely ignored the other basis of my objection and NAK: that
the whole ad-hoc event log export that this code does via the /dev/erst-dbg ABI is
actively harmful.

> Would it be nice if there was a single great injector that covers everything? Yes
> Is it realistic? No.

Everyone else working on this area thinks it's realistic, and is in fact working on
such a facility.

The main thing that is causing confusion here is not the technical viability of such
a project (it's evidently doable and desirable), but your unwillingness to
cooperate. If Intel goes into random directions and essentially obstructs upstream
projects then we wont have this implemented on Intel CPUs sanely and cleanly -
despite Mauro's best efforts on the Nehalem code.

But you should really not bring that up as some kind of positive argument ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/