Re: [RFC] perf: need to expose sched_clock to correlate user sampleswith kernel samples

From: Stephane Eranian
Date: Tue Feb 05 2013 - 17:13:49 EST


On Fri, Feb 1, 2013 at 3:18 PM, Pawel Moll <pawel.moll@xxxxxxx> wrote:
> Hello,
>
> I'd like to revive the topic...
>
> On Tue, 2012-10-16 at 18:23 +0100, Peter Zijlstra wrote:
>> On Tue, 2012-10-16 at 12:13 +0200, Stephane Eranian wrote:
>> > Hi,
>> >
>> > There are many situations where we want to correlate events happening at
>> > the user level with samples recorded in the perf_event kernel sampling buffer.
>> > For instance, we might want to correlate the call to a function or creation of
>> > a file with samples. Similarly, when we want to monitor a JVM with jitted code,
>> > we need to be able to correlate jitted code mappings with perf event samples
>> > for symbolization.
>> >
>> > Perf_events allows timestamping of samples with PERF_SAMPLE_TIME.
>> > That causes each PERF_RECORD_SAMPLE to include a timestamp
>> > generated by calling the local_clock() -> sched_clock_cpu() function.
>> >
>> > To make correlating user vs. kernel samples easy, we would need to
>> > access that sched_clock() functionality. However, none of the existing
>> > clock calls permit this at this point. They all return timestamps which are
>> > not using the same source and/or offset as sched_clock.
>> >
>> > I believe a similar issue exists with the ftrace subsystem.
>> >
>> > The problem needs to be adressed in a portable manner. Solutions
>> > based on reading TSC for the user level to reconstruct sched_clock()
>> > don't seem appropriate to me.
>> >
>> > One possibility to address this limitation would be to extend clock_gettime()
>> > with a new clock time, e.g., CLOCK_PERF.
>> >
>> > However, I understand that sched_clock_cpu() provides ordering guarantees only
>> > when invoked on the same CPU repeatedly, i.e., it's not globally synchronized.
>> > But we already have to deal with this problem when merging samples obtained
>> > from different CPU sampling buffer in per-thread mode. So this is not
>> > necessarily
>> > a showstopper.
>> >
>> > Alternatives could be to use uprobes but that's less practical to setup.
>> >
>> > Anyone with better ideas?
>>
>> You forgot to CC the time people ;-)
>>
>> I've no problem with adding CLOCK_PERF (or another/better name).
>>
>> Thomas, John?
>
> I've just faced the same issue - correlating an event in userspace with
> data from the perf stream, but to my mind what I want to get is a value
> returned by perf_clock() _in the current "session" context_.
>
> Stephane didn't like the idea of opening a "fake" perf descriptor in
> order to get the timestamp, but surely one must have the "session"
> already running to be interested in such data in the first place? So I
> think the ioctl() idea is not out of place here... How about the simple
> change below?
>
The app requesting the timestamp may not necessarily have an active
perf session. And by that I mean, it may not be self-monitoring. But it
could be monitored by an external tool such as perf, without necessary
knowing it.

The timestamp is global or at least per-cpu. It is not tied to a particular
active event.

The thing I did not like about ioctl() is that it now means that the app
needs to become a user of the perf_event API. It needs to program
a dummy event just to get a timestamp. As opposed to just calling
a clock_gettime(CLOCK_PERF) function which guarantees a clock
source identical to that used by perf_events. In that case, the app
timestamps its events in such a way that if it was monitored externally,
that external tool would be able to correlate all the samples because they
would all have the same time source.

But if people are strongly opposed to the clock_gettime() approach, then
I can go with the ioctl() because the functionality is definitively needed
ASAP.



> 8<---
> From 2ad51a27fbf64bf98cee190efc3fbd7002819692 Mon Sep 17 00:00:00 2001
> From: Pawel Moll <pawel.moll@xxxxxxx>
> Date: Fri, 1 Feb 2013 14:03:56 +0000
> Subject: [PATCH] perf: Add ioctl to return current time value
>
> To co-relate user space events with the perf events stream
> a current (as in: "what time(stamp) is it now?") time value
> must be made available.
>
> This patch adds a perf ioctl that makes this possible.
>
> Signed-off-by: Pawel Moll <pawel.moll@xxxxxxx>
> ---
> include/uapi/linux/perf_event.h | 1 +
> kernel/events/core.c | 8 ++++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 4f63c05..b745fb0 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -316,6 +316,7 @@ struct perf_event_attr {
> #define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64)
> #define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
> #define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
> +#define PERF_EVENT_IOC_GET_TIME _IOR('$', 7, __u64)
>
> enum perf_event_ioc_flags {
> PERF_IOC_FLAG_GROUP = 1U << 0,
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 301079d..4202b1c 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3298,6 +3298,14 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> case PERF_EVENT_IOC_SET_FILTER:
> return perf_event_set_filter(event, (void __user *)arg);
>
> + case PERF_EVENT_IOC_GET_TIME:
> + {
> + u64 time = perf_clock();
> + if (copy_to_user((void __user *)arg, &time, sizeof(time)))
> + return -EFAULT;
> + return 0;
> + }
> +
> default:
> return -ENOTTY;
> }
> --
> 1.7.10.4
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/