Re: [PATCH RFC] kvm: enable irq injection from interrupt context

From: Gleb Natapov
Date: Thu Sep 16 2010 - 07:18:06 EST


On Thu, Sep 16, 2010 at 12:53:52PM +0200, Michael S. Tsirkin wrote:
> On Thu, Sep 16, 2010 at 12:54:03PM +0200, Gleb Natapov wrote:
> > On Thu, Sep 16, 2010 at 12:44:55PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Sep 16, 2010 at 12:20:47PM +0200, Gleb Natapov wrote:
> > > > On Thu, Sep 16, 2010 at 12:13:39PM +0200, Michael S. Tsirkin wrote:
> > > > > On Thu, Sep 16, 2010 at 12:13:32PM +0200, Gleb Natapov wrote:
> > > > > > On Thu, Sep 16, 2010 at 11:53:10AM +0200, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Sep 16, 2010 at 11:46:03AM +0200, Avi Kivity wrote:
> > > > > > > > On 09/16/2010 11:25 AM, Gleb Natapov wrote:
> > > > > > > > >>
> > > > > > > > >> MSI only appeared in rhel6, older guests still use level interrupts.
> > > > > > > > >So they are already slow for other reasons.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Exactly, for example they need to exit to userspace to ack the
> > > > > > > > interrupt. That's far slower than the workqueue.
> > > > > > >
> > > > > > > Well, this is not exactly comparable: you might get
> > > > > > > same irq asserted multiple times and only deasserted once.
> > > > > > >
> > > > > > Are we talking about level interrupts? Why would you assert level
> > > > > > triggered interrupt multiple times before deasserting it?
> > > > >
> > > > > User of irqfd has no way to know what current interrupt level is.
> > > > > So it has to keep asserting.
> > > > >
> > > > Why can't it keep track of current level?
> > >
> > > This breaks the model: eventfd user is unaware of PCI, levels and such:
> > > it just signals the event. Remember that asserts are done from e.g. vhost-net,
> > > deasserts need to be handled by qemu.
> > >
> > eventfd user implements HW and it knows exactly what type of interrupt
> > this HW generates.
>
> We haver two users: qemu does deasserts, vhost-net does asserts.
Well this is broken. You want KVM to track level for you and this is
wrong. KVM does this anyway because it can't relay on devise model
to behave correctly [0], but in your case it is designed to behave
incorrectly.

Interrupt type is a device property. PCI devices just happen to be level
triggered according to PCI spec. What if you want to use vhost-net to
implement network device which has active-low interrupt line? [1]

If you want to split parts that asserts irq and de-asserts it then we
should have irqfd that tracks line status and knows interrupt line
polarity.

> Another application is out of process virtio (sandboxing!).
It will still assert and de-assert irq at the same code, so it will be
able to track irq line status.

> Again, pci stuff needs to stay in qemu.
>

Nothing to do with PCI whatsoever.

[0] most qemu devices behave incorrectly and trigger level irq more then
needed.
[1] this is how correct PCI device should behave but we override
polarity in ACPI, but now incorrect behaviour is deeply designed
into vhost-net.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/