Re: [tip:x86/mce] x86, mce: Rename cpu_specific_poll tomce_cpu_specific_poll

From: Andi Kleen
Date: Tue Jan 26 2010 - 11:09:27 EST


On Tue, Jan 26, 2010 at 06:06:26PM +0900, Hidetoshi Seto wrote:
> How about having a system file which can be maintained with kernel,
> e.g. like /proc/hwinfo, /sys/devices/platform/hwinfo, or directory
> with some files like /somewhere/hwinfo/{dmi,acpi,pci,...} etc.?

Why not do that in user space?

In fact it's often already done.

Just because we're kernel programmers doesn't mean that everything
needs to be solved inside the kernel.

> >> And since it's kernel
> >> based it cannot do most of the interesting reactions. And it doesn't
> >> have a usable interface to add user events.
> >>
> >> And yes having all that crap in syslog is completely useless, unless
> >> you're debugging code.
> >
> > So basically, IMHO we need:
> >
> > 1. Resilient error reporting that reliably pushes decoded error info to
> > userspace and/or network. That one might be tricky to do but we'll get
> > there.
>
> I think it would be better to think "error" is a subset of "event",
> which could be reported if interested but otherwise be filtered.
> Use of TRACE_EVENT() for mce event aim such approach at least.

The whole trace event infrastructure right now is not really
aimed/useful for "always on low overhead background monitoring" like
standard error handling requires.

In principle it could be probably fixed (although I'm a bit
sceptical on the "low overhead" part), but I suspect the result
would be neither optimized for error handling nor optimized
for performance monitoring anymore. They simply have
very different requirements.

When you do full event tracing anyways it makes some sense to get events
for errors too, but that's a quite different use-case.

For the "standard" error handling I think we're better of with
something optimized for the job.

> > 2. Error severity grading and acting upon each type accordingly. This
> > might need to be vendor-specific.
>
> I think you mean severity grading in kernel.
> Even if hardware reported an error and graded it as corrected, kernel
> can escalate the severity, likely based on some threshold.

I don't think the kernel should do that, it's so much a policy
decision and these are best kept as near the administrator
as possible (= user space)

That is for some cases it might make sense to have limited thresholds
in the kernel, but I suspect they are limited. Mostly it would
be the case when the hardware itselfs already keeps these counters.

>
> > 3. Proper error format suiting all types of errors.
>
> As mentioned in Andi's PDF, CPER format is one of good candidate
> available today, I think.

Yes for hardware errors. It's definitely not perfect and somewhat
overdesigned, but I'm not sure we could come up with a much better one.
At least a subset of it with some extensions might do. Also in some
cases the error is already in this format.

The advantage of it is that it's at least well understood and documented.

> > 4. Vendor-specific hooks where it is needed for in-kernel handling of
> > certain errors (L3 cache index disable, for example).
>
> Some difficulty would be there to add such hook in the UE handling path,
> but anyway we can have it for the CE path. Just need to be organized.
>
> > 5. Error thresholding, representation, etc all done in userspace (maybe
> > even on a different machine).
>
> (...BTW, how about putting mcelog tree under the /tools, Andi?)

I don't see the advantage. Linux has always been a collection
of packages, not a unified single big tree. Also my current
impression is that the in tree user space builds don't work
very well.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/