Re: Tracing Requirements (was: [RFC/Requirements/Design] h/w errorreporting)

From: Ted Ts'o
Date: Thu Nov 11 2010 - 13:26:06 EST


On Wed, Nov 10, 2010 at 06:20:27PM -0500, Steven Rostedt wrote:
> On Thu, 2010-11-11 at 00:12 +0100, Thomas Gleixner wrote:
>
> > Cramming both into the same session is just insane.
>
> That just doubled the overhead of the tracer.

At least when I've used ftrace for the "flight recorder" use case, I'm
not tracing as well. What I do is enable a bunch of trace points,
maybe I've sprinkled in some "trace_printk()'s" into various kernel
code paths, and then I run the workload which locks up the kernel.
When locks up, I've used sysrq-z to dump out the ftrace ring buffer,
and usually _exactly_ what I need to debug the lock up is waiting for
me in the ring buffer.

So, this use case, is incredibly useful, and I hope whatever folks do
with the new-fangled API, that somehow "overwrite mode" is supported.
Even if for speed reasons, what you do is wait until for the head to
overrun the tail, that the tail gets bumped up by 50% and we lose half
the log (so that whatever expensive locking is necessary only happens
once in a while), I at least would find that quite acceptable.

The other feature/requirements request I would make is that there
should be a way that common kernel abstractions, such as converting a
dev_t to either a MAJOR/MINOR number pair, or to a device name, be
made available. For now I've changed the tracepoints to translate
MAJOR/MINOR and drop integers into the ring buffer, and a generic
workaround in the future is to always drop strings into the ring
buffer instead of allowing the translation to be done in TP_printk
(which doesn't work for perf; it causes the userspace perf client to
fall over and die, without even skipping the problematic tracepoint
record --- boo, hiss). But that can be relatively inefficient,
because we're now having to drop potentially fairly large text strings
into ring buffer, because of limitations that perf has in its output
transformations step.

I know that because perf is doing its output transformation in
userspace, there are fundamental limitations about what it can do.
But it would be nice if it could be expanded at least _somewhat_, and
either way, there needs to be some clear documentation about what it
can and can not accept. And if these limitations means that I should
just simply continue using ftrace, and not use perf, it would be nice
if the tracepoints I create that work with ftrace don't cause perf to
just die horribly when it tries to parse them.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/