Re: [RFC] Unified Ring Buffer (Next Generation)

From: Mathieu Desnoyers
Date: Wed May 19 2010 - 14:47:54 EST


* Andi Kleen (andi@xxxxxxxxxxxxxx) wrote:
> On Wed, May 19, 2010 at 01:51:54PM -0400, Steven Rostedt wrote:
>

Hi Andi,

> Hi Steven,
>
> > More than a year and a half ago (September 2008), at Linux Plumbers, we
> > had a meeting with several kernel developers to come up with a unified
> > ring buffer. A generic ring buffer in the kernel that any subsystem
> > could use. After coming up with a set of requirements, I worked on
>
> If we take a step back.
>
> Why do you want a single ring buffer for everyone?
>
> A lot more low profile subsystems subsystems simply use kfifo
> (which is also actively developed by Stefanie). In fact there
> are far more users of it than of your ring buffer. And it's
> really quite simple and easy to use. And it works fine for them.
>
> I don't think it's that great a goal to have a single ring buffer
> for all possible ring buffer needs. After all the requirements
> are quite different.

One requirement seems to be shared for most tracing heavy users: it has to be
blazingly fast. This is a requirement that is commonly overlooked by the "very
simple" implementations.

>
> Some want a simple ring buffer with minimal overhead
> and simple interface, others need a mmaped one or have other special
> requirements. One size doesn't fit all.

The plan here is to create a ring buffer that supports per-buffer instance
"flags" that specify what must be supported: e.g. either splice() or mmap(),
global vs per-cpu buffers, etc.

I already have much of the code that can easily support all these use-cases with
only minor modifications. So I don't see why we should duplicate the code.

>
> It's also not that we're talking about gigantic amounts of code
> in all cases where there is a pressing need to unify.

For efficient flavors of ring buffer, it's not that much code, that's true, but
the complexity level/LOC is rather higher than for standard code. So it's good
if we can focus our review efforts on a single ring buffer.

> If perf's current ring buffer works for it why not keep using it?

I'll let the perf developers answer this one.

> One problem I always had with your version was that it's quite
> bloated frankly, especially in terms of code size, but
> also in its data structures and in the interface complexity.

I share your concern about the level of complexity of Steven's lockless ring
buffer implementation. Even though Lai and I are listed as "reviewed-by" in the
ring buffer code header, I remember that neither him nor myself were
particularly comfortable with the way complexity has been handled.

The new implementation I propose lessens the complexity level, presents clear
abstractions to deal with that complexity, and comes with a formal proof of
correctness, all of which I think is really very important to give a good level
of insurance that the ring buffer works as expected.

>
> For debugging kernels etc. with tracing that's not that big an issue, but
> I think it's a problem for "non debugging" use. After all Linux
> still has the goal to be at least configurable as a low footprint operating
> system.

My implementation, at the moment, has 50% less lines of code and is 25% smaller
in object size than the current ring buffer.

But all in all, I think users needing _something_ to perform system-wide tracing
shout a lot louder than users who need to save a few bytes. So let's try to get
something good in first, while keeping an eye on the object size, and if it
happens to be too large for some users, then they can always implement a
slower and less efficient ring_buffer_tiny.c if they feel like it.

The main concern for the ring buffer code is more about the number of cache
lines it grabs from the system when it is active, and about being blazingly
fast. We must keep the number of active cache lines very low, sometimes at the
expense of overall code size (e.g., having separate fast/slow paths).

>
> Part of the reason for its big code size seems to be that
> it tries to support everyone's requirements, which unsurprisingly
> leads to some bloat both in implementation and interface.

At the very least Steven's code did not meet the perf requirements nor the LTTng
requirements. As far as the "next gen" version is concerned, I plan to allow
myself to sometimes say "no" (with proper counter-arguments of course) when
people come up with ideas that will either needlessly complexify the ring buffer
code or won't take into account the "special" very heavy usage nature of the
ring buffer.

I really believe in code review and comments. I just don't feel I must always
incorporate all the modifications that are requested without proper
justification. It's just the approach I have, and it seems to have led to good
results so far if we take Tracepoints as an example.

>
> Also to be honest it's so clever now that at least I have
> a hard time understanding it, and I personally prefer code
> that I can understand over too clever code. After all if there
> is a bug in there and you need to be more clever than the programmer
> to debug it, how would that be done?

I totally agree with you. This is in good part why I spent a large part of 2009
writing papers explaining my ring buffer, doing Promela models and formal proofs
of correctness. I think after all that work, the abstractions I will use will be
much easier to grap by anyone willing to do a bit of reading.

>
> Perhaps a better goal would be to have a smaller simpler more
> maintainable buffer for ftrace and let the other users their own?

The advantage of having both ftrace and perf as ring buffer users is that they
can both give me feedback in the development, and this increases chances that
the API will be generic, clean and well documented.

Thanks,

Mathieu

>
> Just my 0.05 cent.
>
> -Andi
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/