Re: [PATCH v9 0/2] kvm: level irqfd support

From: Alex Williamson
Date: Tue Aug 21 2012 - 21:28:19 EST


On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> > Here's the much anticipated re-write of support for level irqfds. As
> > Michael suggested, I've rolled the eoi/ack notification fd into
> > KVM_IRQFD as a new mode. For lack of a better name, as there seems to
> > be objections to associating this specifically with an EOI or an ACK,
> > I've name this OADN or "On Ack, De-assert & Notify".
> >
> > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> > Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> > the reason is it's racy. Objects to track OADNs are made dynamically,
> > we look through existing ones for a match under spinlock and setup a
> > new one if there's no match. On teardown, we can remove the OADN from
> > the list under lock, but that same lock prevents us from de-assigning
> > the IRQ ACK notifier or waiting for an RCU grace period. We must make
> > sure that any unused GSI is de-asserted, but the above means it's
> > possible that another OADN has been created for this source ID/GSI
> > and de-asserting the GSI could lead to breakage.
>
> I do not see it. What breakage? Could you give an example please?
>
>
> I think what you are saying is last deassign must clear
> since otherwise we never will clear.
> I agree it is either that or delay deassign until ack.
>
> Can it be as simple as this (after all rcu etc dances)?
> lock irqfds
> if no oadns
> set level to 0
> unlock irqfds
> ?

lock irqfds
remove irqfd from oadn list
if no oadns
remove oadn
set gsi 0
unlock
lock irqfds
new oadn
unlock irqfds

>> EOI
ack notify new oadn
de-assert gsi
notify new oadn
>> re-assert irqfd
ack notify old oadn
de-assert gsi
notify old oadn

synchronize_rcu

kvm_unregister_irq_ack_notifier

So, because the unregister is removed from the final clear and because
we share an IRQ source ID there's a window where we can have two oadns
registered for the same GSI. The new one will de-assert and notify
while the old one has an empty list to notify, but still de-asserts. We
can therefore de-assert w/o notify.

By using a new source ID, we separate the two so users of the new oadn
can't race the old and we can cleanly free the old source ID,
de-asserting it.

> > Instead each OADN
> > object gets it's own source ID, but these are all shared by users
> > of the same GSI. So for PCI devices, we might have up to 4 IRQ
> > source IDs allocated.
> >
> > Michael had also suggested avoiding reference counting and using
> > list_empty for this OADN object. Unfortunately, that doesn't work
> > for similar reasons. We want to release the OADN object underlock,
> > preventing others from re-using it on the free path, but in order
> > to have lock-less de-assert & notify we use RCU, meaning we can't
> > trust list_empty until after an RCU grace period, which must be
> > done outside of spinlocks.
>
> confused. list empty on assign/deassing would be under lock
> so no need for grace periods to trust it.
> what am I missing?
>
> But if you like kref more that is OK too.

Maybe I'm misinterpreting this:

include/linux/rculist.h:
/**
* list_del_rcu - deletes entry from list without re-initialization
* @entry: the element to delete from the list.
*
* Note: list_empty() on entry does not return true after this,
* the entry is in an undefined state. It is useful for RCU based
* lockfree traversal.

If I can trust list_empty on oadn->irqfds, which maybe I can re-reading
it again, then we can drop the kref. Thanks,

Alex


> > If there are suggestions how we can handle these better, please
> > make them, but I think this compromise is race-free and still
> > manages to make allocation of IRQ source IDs mostly a non-issue
> > for device assignment limits. Thanks,
> >
> > Alex
> >
> > ---
> >
> > Alex Williamson (2):
> > kvm: On Ack, De-assert & Notify KVM_IRQFD extension
> > kvm: Use a reserved IRQ source ID for irqfd
> >
> >
> > Documentation/virtual/kvm/api.txt | 13 ++
> > arch/x86/kvm/x86.c | 4 +
> > include/linux/kvm.h | 7 +
> > include/linux/kvm_host.h | 2
> > virt/kvm/eventfd.c | 199 ++++++++++++++++++++++++++++++++++++-
> > 5 files changed, 218 insertions(+), 7 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/