Re: [patch 0/3] kvm tool: Serial emulation overhaul
From: Avi Kivity
Date: Tue Dec 13 2011 - 05:58:39 EST
On 12/13/2011 02:59 AM, Thomas Gleixner wrote:
<snip trace>
> Why the heck is a paravirtualized guest using an local APIC timer
> emulation, instead of a paravirtualized clock event device?
>
> Just look at the trace. That's insane. We enter the guest for 2us to
> come back and handle the APIC_EOI for 11us. Then we go back to the
> guest for 9us and spend again 11us for handling a write to APIC_TMICT.
>
> That's 11us guest vs. 22us host time.
Run your guest with x2apic enabled, the timing will be very different.
You'll still have an exit for APIC_TMICT and APIC_EOI, but they'll be
much faster. It's possible to avoid the EOI exit with some paravirt
magic, but that has its own issues.
> Aside of that, when looking at the bootup, the guest "calibrates" the
> local APIC timer emulation against an emulated legacy device to figure
> out the APIC timer clock rate, which is totally irrelevant for a
> paravirtualized guest, if done right.
>
> Look how a guest timer is programmed:
>
> hrtimer_start();
> ...
> clock_events_programm_event(dev, expires, now);
> ns_delta = expires - now;
> delta = convert_ns_to_dev(ns_delta, dev);
> dev->set_next_event(delta, dev);
> lapic_next_event(delta, dev);
> apic_write(APIC_TMICT, delta);
> |
> ---> traps into host
> kvm_mmu_pagetable_walk();
> kvm_mmio_emulation();
> kvm_apic_emulation();
> start_apic_timer();
> now = get_host_time();
> delta = convert_apic_to_ns(APIC_TMICT);
> hrtimer_start(apic_timer, now + delta, HRTIMER_MODE_ABS);
>
> Oh well, we
>
> - convert from nsec to a "calibrated" APIC delta
> - "program" the APIC timer
> - trap into the host
> - convert the "calibrated" delta back to nsec
> - add it to the current host time
> - arm the timer
>
> Why the heck don't we use a paravirt device, which just provides a
> nsec based interface. The host knows the time delta between the guests
> notion of CLOCK_MONOTONIC and its own.
We do have a paravirt clocksource, just not clockevents.
> That would reduce the whole
> procedure to:
>
> hrtimer_start();
> ...
> clock_events_programm_event(dev, expires, now);
> dev->set_next_ktime(expires, dev);
> kvm_clock_event_set_next(expires, dev);
> |
> ---> traps into host with a paravirt call
> kvm_handle_guest_clkev_dev();
> hrtimer_start(apic_timer, expires + host_guest_delta, HRTIMER_MODE_ABS);
>
> That would save tons of time on an hot path. Even if the
> host_guest_delta approach does not work, a 1:1 nsec mapping as a
> relative timer on the host would be way faster than the current
> solution.
>
The problem with paravirt clockevents is that if/when the APIC becomes
virtualized, then guests which were started with the paravirt
clockevents don't get accelerated when they are migrated onto newer
hardware. This problem has bitten us several times in the past; if you
want to see how it looks when applied on a large scale look at Xen -
they have a paravirt-the-fsck-out-of-everything mode and a full virt
mode (which should be way faster these days); the two aren't
compatible. Of course back when they started, they didn't have a
choice, but we do.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/