Re: [PATCH 0/4] workqueue_tracepoint: Add worklet tracepoints forworklet lifecycle tracing

From: Steven Rostedt
Date: Fri Apr 24 2009 - 22:51:20 EST



On Fri, 24 Apr 2009, Andrew Morton wrote:

> On Fri, 24 Apr 2009 22:00:20 -0400 (EDT) Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> >
> > I agree that we need to be frugal with the addition of trace points. But
> > I don't think the bugs that can be solved with this is always reproducible
> > by the developer.
> >
> > If you have a distribution kernel that is running at a customers location,
> > you may not have the privilege of shutting down that kernel, patching the
> > code, recompiling and booting up this temporary kernel. It would be nice
> > to have strategic locations in the kernel where we can easily enable a
> > trace point and monitor what is going on.
> >
> > If the customer calls and tells you there's some strange performance
> > issues when running such and such a load, it would be nice to look at
> > things like workqueues to analyze the situation.
>
> Would it? What's the probability that anyone anywhere will *really*
> solve an on-site problem using workqueue tracepoints? Just one person?
>
> I think the probability is quite small, and I doubt if it's high enough
> to add permanent code to the kernel.
>
> Plus: what we _really_ should be looking at is
>
> p(someone uses this for something) -
> p(they could have used a kprobes-based tracer)

This is starting to sound a lot like catch 22. We don't want it in the
kernel if nobody is using it. But nobody is using it because it is not in
the kernel.

>
> no?
>
> > Point being, the events are not for me on the box that runs my machines.
> > Hell, I had Logdev for 10 years doing that for me. But now to have
> > something that is running at a customers site with extremely low overhead
> > that we can enable when problems arise. That is what makes this worth
> > while.
> >
> > Note, when I was contracting, I even had logdev prints inside the
> > production (custom) kernel that I could turn on and off. This was exactly
> > for this purpose. To monitor what is happening inside the kernel when in
> > the field.
>
> We seem to be thrashing around grasping at straws which might justify
> the merging of these tracing patches. It ain't supposed to be that way.


Unfortunately, analyzing system behavior is a lot like grasping at straws.
You may never know what is causing some problem unless you view the entire
picture.

Perhaps the workqueue tracer is not by itself useful for the majority of
people. I'm not arguing that. It comes pretty much free if you are not
using it.

I'm looking more at the TRACE_EVENTs in the workqueue (and other places).
Because having strategically located trace points through out the kernel
that you can enable all at once, can help analyze the system for issues
that might be causing problems. You might be thinking you are having
interrupt issues but enable all events, then you might notice that the
issues is in the workqueues. Picking your own kprobe locations is not
going to help in that regard.

In the old -rt patch series, we had trace points scattered all over the
kernel. This was the original "event tracer". It was low overhead and can
still give a good overview of the system when the function tracer was too
much data. Yes, we solved many issues in -rt because of the event tracer.

Ideally, you want to minimalize the trace points so that it does not look
like a debug session going wild. I could maintain a set of tracepoints
out of tree, but it will only be good for me, and not others.

BTW, you work for Google, doesn't google claim to have some magical
20-some tracepoints that is all they need? Could you give us a hint to
what and where they are?

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/