Re: [RFC] Next gen kvm api

From: Gleb Natapov
Date: Sun Feb 05 2012 - 05:58:07 EST


On Sun, Feb 05, 2012 at 11:56:21AM +0200, Avi Kivity wrote:
> On 02/05/2012 11:51 AM, Gleb Natapov wrote:
> > On Sun, Feb 05, 2012 at 11:44:43AM +0200, Avi Kivity wrote:
> > > On 02/05/2012 11:37 AM, Gleb Natapov wrote:
> > > > On Thu, Feb 02, 2012 at 06:09:54PM +0200, Avi Kivity wrote:
> > > > > Device model
> > > > > ------------
> > > > > Currently kvm virtualizes or emulates a set of x86 cores, with or
> > > > > without local APICs, a 24-input IOAPIC, a PIC, a PIT, and a number of
> > > > > PCI devices assigned from the host. The API allows emulating the local
> > > > > APICs in userspace.
> > > > >
> > > > > The new API will do away with the IOAPIC/PIC/PIT emulation and defer
> > > > > them to userspace. Note: this may cause a regression for older guests
> > > > > that don't support MSI or kvmclock. Device assignment will be done
> > > > > using VFIO, that is, without direct kvm involvement.
> > > > >
> > > > So are we officially saying that KVM is only for modern guest
> > > > virtualization?
> > >
> > > No, but older guests may have reduced performance in some workloads
> > > (e.g. RHEL4 gettimeofday() intensive workloads).
> > >
> > Reduced performance is what I mean. Obviously old guests will continue working.
>
> I'm not happy about it either.
>
It is not only about old guests either. In RHEL we pretend to not
support HPET because when some guests detect it they are accessing
its mmio frequently for certain workloads. For Linux guests we can
avoid that by using kvmclock. For Windows guests I hope we will have
enlightenment timers + RTC, but what about other guests? *BSD? How often
they access HPET when it is available? We will probably have to move
HPET into the kernel if we want to make it usable.

So what is the criteria for device to be emulated in userspace vs kernelspace
in new API? Never? What about vhost-net then? Only if a device works in MSI
mode? This may work for HPET case, but looks like artificial limitation
since the problem with HPET is not interrupt latency, but mmio space
access.

And BTW, what about enlightenment timers for Windows? Are we going to
implement them in userspace or kernel?

> > > > Also my not so old host kernel uses MSI only for NIC.
> > > > SATA and USB are using IOAPIC (though this is probably more HW related
> > > > than kernel version related).
> > >
> > > For devices emulated in userspace, it doesn't matter where the IOAPIC
> > > is. It only matters for kernel provided devices (PIT, assigned devices,
> > > vhost-net).
> > >
> > What about EOI that will have to do additional exit to userspace for each
> > interrupt delivered?
>
> I think the ioapic EOI is asynchronous wrt the core, yes? So the vcpu
Probably, do not see what problem can async EOI may cause.

> can just post the EOI broadcast on the apic-bus socketpair, waking up
> the thread handling the ioapic, and continue running. This trades off
> vcpu latency for using more host resources.
>
Sounds good. This will increase IOAPIC interrupt latency though since next
interrupt (same GSI) can't be delivered until EOI is processed.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/