Re: RFC: paravirtualizing perf_clock

From: David Ahern
Date: Wed Oct 30 2013 - 10:03:54 EST

On 10/29/13 11:59 PM, Masami Hiramatsu wrote:
(2013/10/29 11:58), David Ahern wrote:
To back out a bit, my end goal is to be able to create and merge
perf-events from any context on a KVM-based host -- guest userspace,
guest kernel space, host userspace and host kernel space (userspace
events with a perf-clock timestamp is another topic ;-)).

That is almost same as what we(Yoshihiro and I) are trying on integrated
tracing, we are doing it on ftrace and trace-cmd (but perhaps, it eventually
works on perf-ftrace).

I thought at this point (well, once perf-ftrace gets committed) that you can do everything with perf. What feature is missing in perf that you get with trace-cmd or using debugfs directly?

And then for the cherry on top a design that works across architectures
(e.g., x86 now, but arm later).

I think your proposal is good for the default implementation, it doesn't
depends on the arch specific feature. However, since physical timer(clock)
interfaces and virtualization interfaces strongly depends on the arch,
I guess the optimized implementations will become different on each arch.
For example, maybe we can export tsc-offset to the guest to adjust clock
on x86, but not on ARM, or other devices. In that case, until implementing
optimized one, we can use paravirt perf_clock.

So this MSR read takes about 1.6usecs (from 'perf stat kvm live') and that is total time between VMEXIT and VMENTRY. The time it takes to run perf_clock in the host should be a very small part of that 1.6 usec. I'll take a look at the TSC path to see how it is optimized (suggestions appreciated).

Another thought is to make the use of pv_perf_clock an option -- user can knowingly decide the additional latency/overhead is worth the feature.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at