Re: [PATCH] tracing/filters: add run-time field descriptions toTRACE_EVENT_FORMAT events

From: Steven Rostedt
Date: Wed Apr 01 2009 - 21:36:39 EST



On Wed, 1 Apr 2009, Ingo Molnar wrote:

>
> * Tom Zanussi <tzanussi@xxxxxxxxx> wrote:
>
> > > Oh, do you know why these unfiltered event are disappearing?
> >
> > It's because of the way ring_buffer_discard() currently works -
> > the filtered events are still there in the ring buffer taking up
> > space; they just aren't visible in the output because the read
> > skips over them.
> >
> > So if there's a high volume of events being recorded, any given
> > event can be quickly overwritten before it has a chance to be
> > read, or it may have been read and appeared in the output, but
> > between the time it was read and the next cat, could easily have
> > been overwritten, and therefore 'disappears' on the next cat.
> >
> > It's really no different from the normal case where there is no
> > filtering in place - the same thing would happen but you wouldn't
> > notice whether a particular event was still there or not the next
> > time you cat'ed the file, because of the large volume of events
> > being displayed. It's just more apparent when most of the events
> > are discarded and invisible.
> >
> > At least that's my theory based on my understanding of how the
> > ring-buffer works...
>
> Correct, and this needs to be fixed.

The problem is the old performance vs functionality.

The thing is, to help with performance we need to copy directly into the
ring buffer. The user reserves a location in the ring buffer that is
guaranteed to not go away until the user commits. We also want to allow
re-entrant recording into the buffer. But if data has already been
reservered, it must go after the record:

ring_buffer_lock_reserve()

+--------------+
| some data |
+--------------+ <-- pointer returned by ring_buffer_lock_reserve
| reserved data|
+--------------+
| empty |
+--------------+

The caller can then write directly into the ring buffer. When they are
through, the caller would call ring_buffer_unlock_commit which would
do the following.

+--------------+
| some data |
+--------------+
| written data |
+--------------+
| empty |
+--------------+

Now what Tom wants to do is write to that reserve data, test that data and
then remove it if we do not want it. The problem with this idea is that we
could have a second write already adding data of its own:



ring_buffer_lock_reserve()

+--------------+
| some data |
+--------------+ <-- pointer returned by ring_buffer_lock_reserve
| reserved data|
+--------------+
| empty |
+--------------+

Interrupt comes in and reserves and writes data:

+--------------+
| some data |
+--------------+ <-- pointer returned by ring_buffer_lock_reserve
| reserved data|
+--------------+
| data from |
| interrupt |
+--------------+
| empty |
+--------------+

Now we have a hole in the buffer. If we discard the reserved data it we
can not reuse it.

Now what I could do, is to add a cmpxchg. If there's no data after the
data that we want to discard, then we discard it. Note, this can only
happen before the commit is made.

I'll see if I can implement that.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/