Re: [PATCH 0/4] workqueue_tracepoint: Add worklet tracepoints forworklet lifecycle tracing

From: Steven Rostedt
Date: Fri Apr 24 2009 - 22:00:34 EST



On Fri, 24 Apr 2009, Andrew Morton wrote:

> On Sat, 25 Apr 2009 02:37:03 +0200
> Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
>
> > I discovered it with this tracer. Then it brought me to
> > write this patch:
> >
> > http://lkml.org/lkml/2009/1/31/184
> >
> > ...
> >
> > Still with these same observations, I wrote this another one:
> >
> > http://lkml.org/lkml/2009/1/26/363
>
> OK, it's great that you're working to improve the workqueue code. But
> does this justify permanently adding debug code to the core workqueue
> code? In fact, because you've discovered these problem, the reasons
> for adding the debug code have lessened!
>
> What we need are curious developers looking into how well subsystems
> are performing and how well callers are using them. Adding fairly
> large amounts of permanent debug code into the core subsystems is a
> peculiar way of encouraging such activity.
>
> If a developer is motivated to improve (say) workqueues then they will
> write a bit of ad-hoc code, or poke at it with systemtap or will
> maintain a private ftrace patch - that's all pretty simple stuff for
> such people.
>
> So what is the remaining case for adding these patches? What I see is
>
> a) that their presence will entice people to run them and maybe find
> some problems and
>
> b) the workqueue-maintainer's task is lessened a bit by not having
> to forward-port his debugging patch.
>
> I dunno, it all seems a bit thin to me. Especially when you multiply
> it all by nr_core_subsystems?

I agree that we need to be frugal with the addition of trace points. But
I don't think the bugs that can be solved with this is always reproducible
by the developer.

If you have a distribution kernel that is running at a customers location,
you may not have the privilege of shutting down that kernel, patching the
code, recompiling and booting up this temporary kernel. It would be nice
to have strategic locations in the kernel where we can easily enable a
trace point and monitor what is going on.

If the customer calls and tells you there's some strange performance
issues when running such and such a load, it would be nice to look at
things like workqueues to analyze the situation.

Point being, the events are not for me on the box that runs my machines.
Hell, I had Logdev for 10 years doing that for me. But now to have
something that is running at a customers site with extremely low overhead
that we can enable when problems arise. That is what makes this worth
while.

Note, when I was contracting, I even had logdev prints inside the
production (custom) kernel that I could turn on and off. This was exactly
for this purpose. To monitor what is happening inside the kernel when in
the field.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/