Re: Tracing Requirements (was: [RFC/Requirements/Design] h/w errorreporting)

From: Thomas Gleixner
Date: Wed Nov 10 2010 - 18:59:57 EST


On Wed, 10 Nov 2010, Mathieu Desnoyers wrote:
> * Thomas Gleixner (tglx@xxxxxxxxxxxxx) wrote:
> > > The reason why "concurrent read/write" is required is for server-class machines
> > > which needs to continuously be able to gather trace data to report/find/locate
> > > problematic scenarios happening. This means we're not only interested in one
> > > single failure, but rather by a whole set of erroneous/warning conditions that
> > > need to be reported. Stopping tracing every time data is gathered is
> > > inappropriate, because it would hide errors/warnings that would be happening
> > > during data collection.
> >
> > Aargh! Just because it can be done all in one with an insane amount of
> > complexity does not mean that it's an absolute requirement and a good
> > solution.
> >
> > So if you want to have both the flight recorder crash documentation
> > and the ongoing monitoring then use two separate sessions with
> > separate modes and be done with it.
> >
> > Cramming both into the same session is just insane.
> >
>
> I'm afraid this is not what I proposed above. I'm open to use different tracing
> sessions for different things. However, the server-class case needs to
> continuously gather data so that "trace-shots" can be gathered when problems
> occur. But if you hit two problems back to back, you don't want to lose the
> trace leading to the second issue. Hence the motivation for supporting
> concurrent reading while writing.

Realistically, you are interested in the first one, simply because in
99.9% of the cases the second problem is caused by the first one. Do we
really need to care about the 0.1% which fall into the other category?

Not at all. Simply because the likeliness of those back to back events
_AND_ giving us the 0.1% case is approaching zero.

Of course you can argue with your academic hat on that I'm ignoring
that we might catch this rare "easter and xmas fall on the same day"
event, but I couldn't care less.

> I'd like to start with an implementation that skips some of these requirements
> initially, but what I really think we need to figure out is how we organize our
> ABIs to finally support these requirements.

I did not say, that you should not think about this, but the progress
so far in more than TWO YEARS is exaclty ZERO. And that's what I'm
concerned about.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/