Re: [RFC] perf: need to expose sched_clock to correlate user sampleswith kernel samples

From: David Ahern
Date: Wed Jun 26 2013 - 12:50:03 EST


With all the perf ioctl extensions tossed out the past day or so I wanted to revive this request. Still need a solution to the problem of correlating perf_clock to other clocks ...

On 2/1/13 7:18 AM, Pawel Moll wrote:
Hello,

I'd like to revive the topic...

On Tue, 2012-10-16 at 18:23 +0100, Peter Zijlstra wrote:
On Tue, 2012-10-16 at 12:13 +0200, Stephane Eranian wrote:
Hi,

There are many situations where we want to correlate events happening at
the user level with samples recorded in the perf_event kernel sampling buffer.
For instance, we might want to correlate the call to a function or creation of
a file with samples. Similarly, when we want to monitor a JVM with jitted code,
we need to be able to correlate jitted code mappings with perf event samples
for symbolization.

Perf_events allows timestamping of samples with PERF_SAMPLE_TIME.
That causes each PERF_RECORD_SAMPLE to include a timestamp
generated by calling the local_clock() -> sched_clock_cpu() function.

To make correlating user vs. kernel samples easy, we would need to
access that sched_clock() functionality. However, none of the existing
clock calls permit this at this point. They all return timestamps which are
not using the same source and/or offset as sched_clock.

I believe a similar issue exists with the ftrace subsystem.

The problem needs to be adressed in a portable manner. Solutions
based on reading TSC for the user level to reconstruct sched_clock()
don't seem appropriate to me.

One possibility to address this limitation would be to extend clock_gettime()
with a new clock time, e.g., CLOCK_PERF.

However, I understand that sched_clock_cpu() provides ordering guarantees only
when invoked on the same CPU repeatedly, i.e., it's not globally synchronized.
But we already have to deal with this problem when merging samples obtained
from different CPU sampling buffer in per-thread mode. So this is not
necessarily
a showstopper.

Alternatives could be to use uprobes but that's less practical to setup.

Anyone with better ideas?

You forgot to CC the time people ;-)

I've no problem with adding CLOCK_PERF (or another/better name).

Thomas, John?

I've just faced the same issue - correlating an event in userspace with
data from the perf stream, but to my mind what I want to get is a value
returned by perf_clock() _in the current "session" context_.

Stephane didn't like the idea of opening a "fake" perf descriptor in
order to get the timestamp, but surely one must have the "session"
already running to be interested in such data in the first place? So I
think the ioctl() idea is not out of place here... How about the simple
change below?

Regards

Pawel

8<---
From 2ad51a27fbf64bf98cee190efc3fbd7002819692 Mon Sep 17 00:00:00 2001
From: Pawel Moll <pawel.moll@xxxxxxx>
Date: Fri, 1 Feb 2013 14:03:56 +0000
Subject: [PATCH] perf: Add ioctl to return current time value

To co-relate user space events with the perf events stream
a current (as in: "what time(stamp) is it now?") time value
must be made available.

This patch adds a perf ioctl that makes this possible.

Signed-off-by: Pawel Moll <pawel.moll@xxxxxxx>
---
include/uapi/linux/perf_event.h | 1 +
kernel/events/core.c | 8 ++++++++
2 files changed, 9 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 4f63c05..b745fb0 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -316,6 +316,7 @@ struct perf_event_attr {
#define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64)
#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
+#define PERF_EVENT_IOC_GET_TIME _IOR('$', 7, __u64)

enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 301079d..4202b1c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3298,6 +3298,14 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case PERF_EVENT_IOC_SET_FILTER:
return perf_event_set_filter(event, (void __user *)arg);

+ case PERF_EVENT_IOC_GET_TIME:
+ {
+ u64 time = perf_clock();
+ if (copy_to_user((void __user *)arg, &time, sizeof(time)))
+ return -EFAULT;
+ return 0;
+ }
+
default:
return -ENOTTY;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/