Re: [PATCH v5 3/4] kvm: Create kvm_clear_irq()

From: Michael S. Tsirkin
Date: Wed Jul 18 2012 - 07:08:05 EST


On Wed, Jul 18, 2012 at 01:53:15PM +0300, Gleb Natapov wrote:
> On Wed, Jul 18, 2012 at 01:51:05PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 18, 2012 at 01:36:08PM +0300, Gleb Natapov wrote:
> > > On Wed, Jul 18, 2012 at 01:33:35PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 18, 2012 at 01:27:39PM +0300, Gleb Natapov wrote:
> > > > > On Wed, Jul 18, 2012 at 01:20:29PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jul 18, 2012 at 09:27:42AM +0300, Gleb Natapov wrote:
> > > > > > > On Tue, Jul 17, 2012 at 07:14:52PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > _Seems_ racy, or _is_ racy? Please identify the race.
> > > > > > > >
> > > > > > > > Look at this:
> > > > > > > >
> > > > > > > > static inline int kvm_irq_line_state(unsigned long *irq_state,
> > > > > > > > int irq_source_id, int level)
> > > > > > > > {
> > > > > > > > /* Logical OR for level trig interrupt */
> > > > > > > > if (level)
> > > > > > > > set_bit(irq_source_id, irq_state);
> > > > > > > > else
> > > > > > > > clear_bit(irq_source_id, irq_state);
> > > > > > > >
> > > > > > > > return !!(*irq_state);
> > > > > > > > }
> > > > > > > >
> > > > > > > >
> > > > > > > > Now:
> > > > > > > > If other CPU changes some other bit after the atomic change,
> > > > > > > > it looks like !!(*irq_state) might return a stale value.
> > > > > > > >
> > > > > > > > CPU 0 clears bit 0. CPU 1 sets bit 1. CPU 1 sets level to 1.
> > > > > > > > If CPU 0 sees a stale value now it will return 0 here
> > > > > > > > and interrupt will get cleared.
> > > > > > > >
> > > > > > > This will hardly happen on x86 especially since bit is set with
> > > > > > > serialized instruction.
> > > > > >
> > > > > > Probably. But it does make me a bit uneasy. Why don't we pass
> > > > > > irq_source_id to kvm_pic_set_irq/kvm_ioapic_set_irq, and move
> > > > > > kvm_irq_line_state to under pic_lock/ioapic_lock? We can then use
> > > > > > __set_bit/__clear_bit in kvm_irq_line_state, making the ordering simpler
> > > > > > and saving an atomic op in the process.
> > > > > >
> > > > > With my patch I do not see why we can't change them to unlocked variant
> > > > > without moving them anywhere. The only requirement is to not use RMW
> > > > > sequence to set/clear bits. The ordering of setting does not matter. The
> > > > > ordering of reading is.
> > > >
> > > > You want to use __set_bit/__clear_bit on the same word
> > > > from multiple CPUs, without locking?
> > > > Why won't this lose information?
> > > Because it is not RMW. If it is then yes, you can't do that.
> >
> > You are saying __set_bit does not do RMW on x86? Interesting.
> I think it doesn't.

Anywhere I can read about this?

> > It's probably not a good idea to rely on this I think.
> >
> The code is no in arch/x86 so probably no. Although it is used only on
> x86 (and ia64 which has broken kvm anyway).

Yes but exactly the reverse is documented.

/**
* __set_bit - Set a bit in memory
* @nr: the bit to set
* @addr: the address to start counting from
*
* Unlike set_bit(), this function is non-atomic and may be reordered.


>>>> pls note the below

* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
>>>> until here

*/
static inline void __set_bit(int nr, volatile unsigned long *addr)
{
asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory");
}




> > > >
> > > > In any case, it seems simpler and safer to do accesses under lock
> > > > than rely on specific use.
> > > >
> > > > > --
> > > > > Gleb.
> > >
> > > --
> > > Gleb.
>
> --
> Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/