Re: [PATCH 7/7] ring-buffer: Add lost event count to end of subbuffer

From: Steven Rostedt
Date: Thu Apr 01 2010 - 08:23:48 EST


On Thu, 2010-04-01 at 08:13 +0200, Frederic Weisbecker wrote:
> On Wed, Mar 31, 2010 at 11:17:42PM -0400, Steven Rostedt wrote:

> If you don't have enough room to put lost_events in the end of
> the buffer, you could keep as is cpu_buffer->last_overrun,
> so the next time you swap the reader page, the new delta
> will be added to the previous that you couldn't transmit.
>
> That way you ensure the user always know when you lose events.

When is not the issue since we set the MISSING_EVENTS flag already. It's
how many were lost at that time that we do not store.

I actually found in most cases there is room, and trace-cmd can report
it. When there is no room, it just says events were dropped without
reporting the actual number.


This is what it shows when the number can be stored:

trace-cmd-4980 [002] 32784.303699: lock_acquire: 0xffff88003ebf0e08 &(&sbi->s_rsv_window_lock)->rlock
trace-cmd-4980 [002] 32784.303700: lock_release: 0xffff88003ebf0e08 &(&sbi->s_rsv_window_lock)->rlock
CPU:1 [415 EVENTS DROPPED]
trace-cmd-4981 [001] 32784.303701: lock_release: 0xffff88000c49d7d0 &ei->truncate_mutex
trace-cmd-4981 [001] 32784.303704: lock_acquire: 0xffffffff816591e0 read rcu_read_lock
trace-cmd-4980 [002] 32784.303705: lock_acquire: 0xffff88003c0719d8 &(&bgl->locks[i].lock)->rlock

This is what it shows when the number can not:

<idle>-0 [002] 32784.304459: lock_acquire: 0xffffffff816591e0 read rcu_read_lock
<idle>-0 [002] 32784.304464: lock_release: 0xffffffff816591e0 rcu_read_lock
CPU:1 [EVENTS DROPPED]
ls-4984 [001] 32784.304471: lock_acquire: 0xffff88003f8b8018 &(&base->lock)->rlock
ls-4984 [001] 32784.304471: lock_release: 0xffff88003f8b8018 &(&base->lock)->rlock
ls-4984 [001] 32784.304471: softirq_exit: vec=1 [action=TIMER]


The problem with storing it later is that there is no guarantee that it
will be stored later. How far should the output look for that number. By
delaying when to print it, the output would need to search for that
number.

What happens if events were lost again before we found the number? Do we
add them together? Then we wont know how many events were lost between
the two spots where the events were lost. That is, if 448 events were
lost but we could not store the number, and then 318 events were lost,
and now we can store the number, do we add them together and report 766
events lost?

The end of the report still gives the total number lost, as it has
always done. Now I'm trying to report where they are lost.

Also note, in practice, the majority of the time we can record the
number:

$ trace-cmd report | grep EVENTS
CPU:1 [82 EVENTS DROPPED]
CPU:0 [79 EVENTS DROPPED]
CPU:2 [75 EVENTS DROPPED]
CPU:1 [86 EVENTS DROPPED]
CPU:0 [87 EVENTS DROPPED]
CPU:2 [EVENTS DROPPED]
CPU:0 [87 EVENTS DROPPED]
CPU:0 [79 EVENTS DROPPED]
CPU:1 [177 EVENTS DROPPED]
CPU:0 [78 EVENTS DROPPED]
CPU:0 [73 EVENTS DROPPED]
CPU:1 [163 EVENTS DROPPED]
CPU:2 [86 EVENTS DROPPED]
CPU:0 [75 EVENTS DROPPED]
CPU:2 [89 EVENTS DROPPED]
CPU:0 [EVENTS DROPPED]
CPU:0 [75 EVENTS DROPPED]
CPU:1 [EVENTS DROPPED]
CPU:2 [EVENTS DROPPED]
CPU:0 [77 EVENTS DROPPED]
CPU:0 [EVENTS DROPPED]
CPU:0 [78 EVENTS DROPPED]
CPU:1 [90 EVENTS DROPPED]
CPU:2 [86 EVENTS DROPPED]
CPU:1 [87 EVENTS DROPPED]
CPU:2 [84 EVENTS DROPPED]
CPU:2 [82 EVENTS DROPPED]
CPU:1 [90 EVENTS DROPPED]
CPU:1 [89 EVENTS DROPPED]
CPU:2 [EVENTS DROPPED]
CPU:0 [EVENTS DROPPED]
CPU:2 [164 EVENTS DROPPED]
CPU:1 [EVENTS DROPPED]
CPU:0 [79 EVENTS DROPPED]
CPU:0 [89 EVENTS DROPPED]


If we really want this number every time, then we need to add it to the
sub buffer header, and not hope there was room to store it.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/