Re: [RFC] para virt interface of perf to support kvm guest os statisticscollection in guest os

From: Avi Kivity
Date: Wed Jun 09 2010 - 05:46:41 EST

On 06/09/2010 12:30 PM, Zhang, Yanmin wrote:
On Wed, 2010-06-09 at 11:59 +0300, Avi Kivity wrote:
On 06/09/2010 06:30 AM, Zhang, Yanmin wrote:
From: Zhang, Yanmin<yanmin_zhang@xxxxxxxxxxxxxxx>

Based on Ingo's idea, I implement a para virt interface for perf to support
statistics collection in guest os. That means we could run tool perf in guest
os directly.

Great thanks to Peter Zijlstra. He is really the architect and gave me architecture
design suggestions. I also want to thank Yangsheng and LinMing for their generous

The design is:

1) Add a kvm_pmu whose callbacks mostly just calls hypercall to vmexit to host kernel;
2) Create a host perf_event per guest perf_event;
3) Host kernel syncs perf_event count/overflows data changes to guest perf_event
when processing perf_event overflows after NMI arrives. Host kernel inject NMI to guest
kernel if a guest event overflows.
4) Guest kernel goes through all enabled event on current cpu and output data when they
5) No change in user space.

Other issues:

- save/restore support for live migration
Well, it's a little hard to process perf_event under live migration case.
I will check it.

It's probably the biggest benefit of paravirt PMU over non-paravirt PMU, and live migration is one of the most important features of virtualization. So we really need to get this working.

- some way to limit the number of open handles (comes automatically with
the table approach I suggested earlier)
Current perf doesn't restrict perf_event number. Kernel does a rotation to collect
statistics of all perf_events.

We must have some restriction, since we consume host resources for each perf_event.

My patch just follows this style.
The table method might be not good, because below scenario:
guest perf_event might be a per-task event at guest side. When the guest application task is
migrated to another cpu, the perf_event peer at host side should also be migrated to the new vcpu
thread. With table method, we need do some rearrangement on the table when event migration happens.
Here migration I mention is not guest live migration.

Yes. But the code for that already exists, no? Real hardware has limited resources so perf multiplexes unlimited user perf_events on limited hardware perf_events. The same can happen here, perhaps with a larger limit.

error compiling committee.c: too many arguments to function

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at