Re: [PATCH -v2 2/3] ACPI, APEI, Add APEI generic error status printsupport

From: Andrew Morton
Date: Tue Nov 30 2010 - 18:50:12 EST


On Tue, 30 Nov 2010 15:00:31 +0800
Huang Ying <ying.huang@xxxxxxxxx> wrote:

> > However in this case you are avowedly treating the printks as a
> > userspace interface, with the intention that software be written to
> > parse them, yes? So once they're in place, we cannot change them? That
> > makes it more important.
>
> If my understanding is correct, Linus still don't like the idea of user
> space hardware error tool.

I'm sure he has no problem with a userspace tool ;) Surely what he doesn't
like is the proposed kernel interface.

> On the other hand, if we need this tool, I
> think printk is not a good tool-oriented hardware error reporting
> interface for it, because:
>
> - There is no overall format or record boundaries for printk, because
> printk is traditionally for 1-2 lines. This makes that printk is hard
> to parse in general.

Well. These things can be addressed by careful choice of output
format.

> - Messages from different CPUs may be interleaved.

A single printk() should be atomic.

> - Good error reporting is too verbose for kernel log
>
> - printk has no internal priority support, so that high severity errors
> has no more priority than low severity ones.
>
>
> So my opinion is:
>
> - Use printk as human oriented hardware error reporting.
> - Use another tool oriented interface for user space hardware error tool
> if necessary.
>
> Do you agree? Do you think printk can be used as a good tool-oriented
> hardware error reporting interface too?

I agree that using printk() is pretty sucky.

However your proposals are waaaaaaaaay too narrow and specific IMO.
There are several reasons why people want more regular and structured
kerenl->userspace messaging features. One such requirement is for
internationalisation: people want messages to come out in some
non-language-specific manner so that userspace tools can perform
catalogue lookups and display the messages in the appropriate language.
Others (eg google) want to feed the messages into large-scale
capturing systems for offline analysis. And there are other
requirements which I forget. Such a messaging/logging system would
also incorporate the requirement to log to a persistent store.

So I think that quite a lot of people would be interested in proposals
for a new and improved kernel->userspace messaging/logging facility.
But talking about "hardware error reporting" (especially when it covers
only a teeny subset of possible hardware errors!) is very myopic.

And implementing the broad facility would be a pretty big project. Simply
chasing down all the stakeholders and understanding their needs would
turn one's hair grey.

So we're a bit stuck, really. We would benefit from a quite broad and
expensive-to-implement messaging/logging system, but we don't even know
what that will look like yet. You have a small and highly-specific
subset of that. If we merge the subset then it probably will live
forever even if the broader feature gets written one day, because the
subset is userspace-visible and adds interfaces which the larger system
probably won't even implement.

So... for your pretty narrow and specific problem, perhaps using
printk as a stopgap until somethine better to come along is the correct
choice.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/