Re: [PATCH 0/4] workqueue_tracepoint: Add worklet tracepoints forworklet lifecycle tracing

From: Andrew Morton
Date: Wed Apr 29 2009 - 00:35:53 EST


On Wed, 29 Apr 2009 13:03:51 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:

> > But in this case the approach is different - the problem statement is
> > "I need to add tracepoints to subsystem X". It's not driven by any
> > particular development problem. So there's no guarantee at all that the
> > end result will be _useful_ for anything!
>
> May I explain my opinion? I am original patch author of latency enhancement of
> workqueue tracer.
>
> In real world, desktop and server user use various the out of tree driver and kernel
> module (e.g. some graphics driver, DRBD, proprietery security software et.al).
> and poor quality driver often make bug in asynchronous processing place
> (e.g. timer, workqueue, irq).
>
> the bug may not be easy fixable and analyzable. if kernel oops happend,
> it's easy. oops log point out suspector clearly in almost case.
> but if the poor driver makes large latency, the evidence often vanished
> before latency occured.
>
> When trouble happend, An administrator get large bafflement. "Oh, which software
> is wrong? how do I divide good and wrong software?".
> In past days, We always say "hehe, you use proprietery module. can you
> reproduce this bug on upstream kernel only?". this answer don't help
> nor solve end-user. it is one of escape of accountability.
>
> The good well defined static tracepoint help its situation largely.
>
>
> In addition, As far as I know, typical DTrace user don't use dynamic
> tracing feature at all.
> They think they don't know how choice proper probe point for dynamic tracing.
> They repeatedly strongly hope to increase well defined static probe point. they
> think dynamic trace feature is too hard to use.
>
> I also strongly dislike random place tracepoint. but I don't think this patch
> series is random.
> and I think other asynchronous processing subsystem need static tracepoint.

OK.

It's quite unclear to me how we get from here to a situation where we
have something which your administrator can use. Hopefully someone
some day will pull all this together into an overall integrated
toolset. The fact that the kernel work is being done (afaict)
waaaaaaaay in advance of that development means that we'll probably
have to revist the kernel work. So be it.

But your administrator wouldn't even know to go looking at workqueues!
Unless the userspace support tools are very very good. He might
instead spend hours poking at the sleep-tracer or the rwsem-tracer or
the slow-work-tracer or blah blah.

I expect that a generic function-entry tracer (which we already have)
would be the starting tool for diagnosing such a problem. Probably it
would be the ending tool too.

What's the terminal state here? The end result? I see lots of random
somewhat-useless-looking tracers being added, but what are we actually
working towards?

Until we know that, how do we know that we're not adding stuff
which we won't need (as I suspect we are doing)?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/