Re: [PATCH v5 3/4] kvm: Create kvm_clear_irq()

From: Alex Williamson
Date: Tue Jul 17 2012 - 11:51:40 EST


On Tue, 2012-07-17 at 18:36 +0300, Michael S. Tsirkin wrote:
> On Tue, Jul 17, 2012 at 09:20:11AM -0600, Alex Williamson wrote:
> > On Tue, 2012-07-17 at 17:53 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jul 17, 2012 at 08:21:51AM -0600, Alex Williamson wrote:
> > > > On Tue, 2012-07-17 at 17:08 +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Jul 17, 2012 at 07:56:09AM -0600, Alex Williamson wrote:
> > > > > > On Tue, 2012-07-17 at 13:14 +0300, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Jul 16, 2012 at 02:34:03PM -0600, Alex Williamson wrote:
> > > > > > > > This is an alternative to kvm_set_irq(,,,0) which returns the previous
> > > > > > > > assertion state of the interrupt and does nothing if it isn't changed.
> > > > > > > >
> > > > > > > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > > > > > > ---
> > > > > > > >
> > > > > > > > include/linux/kvm_host.h | 3 ++
> > > > > > > > virt/kvm/irq_comm.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > 2 files changed, 81 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > > > > > index a7661c0..6c168f1 100644
> > > > > > > > --- a/include/linux/kvm_host.h
> > > > > > > > +++ b/include/linux/kvm_host.h
> > > > > > > > @@ -219,6 +219,8 @@ struct kvm_kernel_irq_routing_entry {
> > > > > > > > u32 type;
> > > > > > > > int (*set)(struct kvm_kernel_irq_routing_entry *e,
> > > > > > > > struct kvm *kvm, int irq_source_id, int level);
> > > > > > > > + int (*clear)(struct kvm_kernel_irq_routing_entry *e,
> > > > > > > > + struct kvm *kvm, int irq_source_id);
> > > > > > > > union {
> > > > > > > > struct {
> > > > > > > > unsigned irqchip;
> > > > > > > > @@ -629,6 +631,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
> > > > > > > > unsigned long *deliver_bitmask);
> > > > > > > > #endif
> > > > > > > > int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
> > > > > > > > +int kvm_clear_irq(struct kvm *kvm, int irq_source_id, u32 irq);
> > > > > > > > int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
> > > > > > > > int irq_source_id, int level);
> > > > > > > > void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
> > > > > > > > diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> > > > > > > > index 5afb431..76e8f22 100644
> > > > > > > > --- a/virt/kvm/irq_comm.c
> > > > > > > > +++ b/virt/kvm/irq_comm.c
> > > > > > > > @@ -68,6 +68,42 @@ static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
> > > > > > > > return kvm_ioapic_set_irq(ioapic, e->irqchip.pin, level);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static inline int kvm_clear_irq_line_state(unsigned long *irq_state,
> > > > > > > > + int irq_source_id)
> > > > > > > > +{
> > > > > > > > + return !!test_and_clear_bit(irq_source_id, irq_state);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static int kvm_clear_pic_irq(struct kvm_kernel_irq_routing_entry *e,
> > > > > > > > + struct kvm *kvm, int irq_source_id)
> > > > > > > > +{
> > > > > > > > +#ifdef CONFIG_X86
> > > > > > > > + struct kvm_pic *pic = pic_irqchip(kvm);
> > > > > > > > + int level = kvm_clear_irq_line_state(&pic->irq_states[e->irqchip.pin],
> > > > > > > > + irq_source_id);
> > > > > > > > + if (level)
> > > > > > > > + kvm_pic_set_irq(pic, e->irqchip.pin,
> > > > > > > > + !!pic->irq_states[e->irqchip.pin]);
> > > > > > > > + return level;
> > > > > > >
> > > > > > > I think I begin to understand: if (level) checks it was previously set,
> > > > > > > and then we clear if needed?
> > > > > >
> > > > > > It's actually very simple, if we change anything in irq_states, then
> > > > > > update via the chip specific set_irq function.
> > > > > >
> > > > > > > I think it's worthwhile to rename
> > > > > > > level to orig_level and rewrite as:
> > > > > > >
> > > > > > > if (orig_level && !pic->irq_states[e->irqchip.pin])
> > > > > > > kvm_pic_set_irq(pic, e->irqchip.pin, 0);
> > > > > > >
> > > > > > > This both makes the logic clear without need for comments and
> > > > > > > saves some cycles on pic in case nothing actually changed.
> > > > > >
> > > > > > That may work, but it's not actually the same thing. kvm_set_irq(,,,0)
> > > > > > will clear the bit and call kvm_pic_set_irq with the new irq_states
> > > > > > value, whether it's 0 or 1. The optimization I make is to only call
> > > > > > kvm_pic_set_irq if we've "changed" irq_states. You're taking that one
> > > > > > step further to "changed and is now 0". I don't know if that's correct
> > > > > > behavior.
> > > > >
> > > > > If not then I don't understand. You clear a bit
> > > > > in a word. You never change it to 1, do you?
> > > >
> > > > Correct, but kvm_set_irq(,,,0) may call kvm_pic_set_irq(,,1) if other
> > > > source IDs are still asserting the interrupt. Your proposal assumes
> > > > that unless irq_states is also 0 we don't need to call kvm_pic_set_irq,
> > > > and I don't know if that's correct.
> > >
> > > Well you are asked to clear some id and level was 1. So we know
> > > interrupt was asserted. Either we clear it or we don't. No?
> > >
> > > > >
> > > > > But this brings another question:
> > > > >
> > > > > static inline int kvm_irq_line_state(unsigned long *irq_state,
> > > > > int irq_source_id, int level)
> > > > > {
> > > > > /* Logical OR for level trig interrupt */
> > > > > if (level)
> > > > > set_bit(irq_source_id, irq_state);
> > > > > else
> > > > > clear_bit(irq_source_id, irq_state);
> > > > >
> > > > >
> > > > > ^^^^^^^^^^^
> > > > > above uses locked instructions
> > > > >
> > > > > return !!(*irq_state);
> > > > >
> > > > >
> > > > > above doesn't
> > > > >
> > > > > }
> > > > >
> > > > >
> > > > > why the insonsistency?
> > > >
> > > > Note that set/clear_bit are not locked instructions,
> > >
> > > On x86 they are:
> > > static __always_inline void
> > > set_bit(unsigned int nr, volatile unsigned long *addr)
> > > {
> > > if (IS_IMMEDIATE(nr)) {
> > > asm volatile(LOCK_PREFIX "orb %1,%0"
> > > : CONST_MASK_ADDR(nr, addr)
> > > : "iq" ((u8)CONST_MASK(nr))
> > > : "memory");
> > > } else {
> > > asm volatile(LOCK_PREFIX "bts %1,%0"
> > > : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
> > > }
> > > }
> > >
> > > > but atomic
> > > > instructions and it could be argued that reading the value is also
> > > > atomic. At least that was my guess when I stumbled across the same
> > > > yesterday. IMHO, we're going off into the weeds again with these last
> > > > two patches. It may be a valid optimization, but it really has no
> > > > bearing on the meat of the series (and afaict, no significant
> > > > performance difference either).
> > >
> > > For me it's not a performance thing. IMO code is cleaner without this locking:
> > > we add a lock but only use it in some cases, so the rules become really
> > > complex.
> >
> > Seriously?
> >
> > spin_lock(&irqfd->source->lock);
> > if (!irqfd->source->level_asserted) {
> > kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > irqfd->source->level_asserted = true;
> > }
> > spin_unlock(&irqfd->source->lock);
> >
> > ...
> >
> > spin_lock(&eoifd->source->lock);
> > if (eoifd->source->level_asserted) {
> > kvm_set_irq(eoifd->kvm,
> > eoifd->source->id, eoifd->notifier.gsi, 0);
> > eoifd->source->level_asserted = false;
> > eventfd_signal(eoifd->eventfd, 1);
> > }
> > spin_unlock(&eoifd->source->lock);
> >
> >
> > Locking doesn't get much more straightforward than that
>
> Don't look at it in isolation. You are now calling kvm_set_irq
> from under a spinlock. You are saying it is always safe but
> this seems far from obvious. kvm_set_irq used to be
> unsafe from an atomic context.

Device assignment has been calling kvm_set_irq from atomic context for
quite a long time.

> > > And current code looks buggy if yes we need to fix it somehow.
> >
> >
> > Which to me seems to indicate this should be handled as a separate
> > effort.
>
> A separate patchset, sure. But likely a prerequisite: we still need to
> look at all the code. Let's not copy bugs, need to fix them.

This looks tangential to me unless you can come up with an actual reason
the above spinlock usage is incorrect or insufficient.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/