Re: [PATCH] Perf: Correct Assumptions about Sample Timestamps inPasses

From: David Ahern
Date: Wed Jan 08 2014 - 16:48:48 EST


On 1/4/14, 8:05 AM, Frederic Weisbecker wrote:
On Fri, Jan 03, 2014 at 03:45:36PM -0700, David Ahern wrote:
On 1/3/14, 3:07 PM, Frederic Weisbecker wrote:
I'm not sure I understand why we need that. Why doesn't it work by simply flushing
events prior to the earliest timestamp among every CPUs last event?

Here's one scenario. Consider N-mmaps:

|----- t_flush
v
0 -----|---x------------------------
1 -----|----|------------------------
... |
N -----|-------ssss-|-----------------

t_start t_1 ... t_N

You start a round at some time -- t_start. By starting a round it
means you go to mmap 0 and check for events, then mmap 1, ..., mmap
N. It takes a finite amount of time to move from one mmap to
another.

Assume there are no events on mmap 0, 1, ... N-1 but samples are
generated in mmap N. In the time it takes to move forward from 0 to
N, a sample can be generated for mmap 0 and written to the buffer -
the 'x' above. It now contains a timestamp < than samples on any
other mmap and out pops the flush error.

Lets reformulate as following. I'm copy-pasting the example in session.c
but adapting it to your scenario.

* ============ PASS n =================
* CPU 0 | CPU 1
* |
* - | 2
* - | 3
* - | 4 <--- max recorded
*
* ============ PASS n + 1 ==============
* CPU 0 | CPU 1
* |
* 1 | 5
* - | 6
* - | 7 <---- max recorded
*
* Flush every events below timestamp 4
*

So in the first round, CPU 0 has no event by the time we read it. Then while
we read the events from CPU 1 (val 2,3,4), a new one comes in concurrently
in CPU 0 (val 1, which matches 's' in your example ). We missed it due to the
linear mmap read on all buffers so we'll get it on the second round.

We find it out in the second round, CPU 1 has also new events. At this time we know that
if CPU 0 had events up to timestamp 4, we should have seen all of them because
we read CPU 0 buffer in PASS n + 1 after we read CPU 1 buffer on PASS n.

Of course that's what happens in a perfect world with the assumption that ordering
is all correct, that events write/commit doesn't take too much time to complete,
that perf_clock() is globally monotonic (and it's not IIUC). But a little heuristical
correction on the timestamp barrier should prevent from issues with that.

So this is how the code behaves currently and it should handle a case like above.

The existing code does not work. Your unstable tsc patch did not work. I have not tried Joseph's patch. Are you proposing that one or do you have something else in mind?

Now there is still the problem of:

1) local timestamps not moving forward (could it happen when events happen in storm,
when they overflow multiple times in once for example, and clock is not granular
enough?)

Even at 650k events/sec I am not seeing this problem.

Anyway this should be solved with the patch that takes the earliest last event on all
CPU buffer instead of the maximum of a round as a barrier.

2) local timestamps not monotonic due to interrupting events. This could be fixed
in the kernel with moving perf_clock() snapshot in perf_output_sample().


For perf-kvm the events are all tracepoints, so there should not be a problem of overlap due to interruption.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/