Re: Re: RFC: paravirtualizing perf_clock

From: Masami Hiramatsu
Date: Wed Oct 30 2013 - 01:59:53 EST


(2013/10/29 11:58), David Ahern wrote:
> On 10/28/13 7:15 AM, Peter Zijlstra wrote:
>>> Any suggestions on how to do this and without impacting performance. I
>>> noticed the MSR path seems to take about twice as long as the current
>>> implementation (which I believe results in rdtsc in the VM for x86 with
>>> stable TSC).
>>
>> So assuming all the TSCs are in fact stable; you could implement this by
>> syncing up the guest TSC to the host TSC on guest boot. I don't think
>> anything _should_ rely on the absolute TSC value.
>>
>> Of course you then also need to make sure the host and guest tsc
>> multipliers (cyc2ns) are identical, you can play games with
>> cyc2ns_offset if you're brave.
>>
>
> This and the method Gleb mentioned both are going to be complex and
> fragile -- based assumptions on how the perf_clock timestamps are
> generated. For example, 489223e assumes you have the tracepoint enabled
> at VM start with some means of capturing the data (e.g., a perf-session
> active). In both cases the end result requires piecing together and
> re-generating the VM's timestamp on the events. For perf this means
> either modifying the tool to take parameters and an algorithm on how to
> modify the timestamp or a homegrown tool to regenerate the file with
> updated timestamps.
>
> To back out a bit, my end goal is to be able to create and merge
> perf-events from any context on a KVM-based host -- guest userspace,
> guest kernel space, host userspace and host kernel space (userspace
> events with a perf-clock timestamp is another topic ;-)).

That is almost same as what we(Yoshihiro and I) are trying on integrated
tracing, we are doing it on ftrace and trace-cmd (but perhaps, it eventually
works on perf-ftrace).

> Having the
> events generated with the proper timestamp is the simpler approach than
> trying to collect various tidbits of data, massage timestamps (and
> hoping the clock source hasn't changed) and then merge events.

Yeah, if possible, we'd like to use it too.

>
> And then for the cherry on top a design that works across architectures
> (e.g., x86 now, but arm later).

I think your proposal is good for the default implementation, it doesn't
depends on the arch specific feature. However, since physical timer(clock)
interfaces and virtualization interfaces strongly depends on the arch,
I guess the optimized implementations will become different on each arch.
For example, maybe we can export tsc-offset to the guest to adjust clock
on x86, but not on ARM, or other devices. In that case, until implementing
optimized one, we can use paravirt perf_clock.

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@xxxxxxxxxxx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/