Re: [PATCH v2] drivers/uio/uio_pdrv_genirq.c: Fix memory freeing issues

From: Vitalii Demianets
Date: Fri Dec 07 2012 - 10:00:46 EST


On Friday 07 December 2012 15:51:02 Vitalii Demianets wrote:
> On Friday 07 December 2012 11:41:45 Vitalii Demianets wrote:
> > On Friday 07 December 2012 00:15:59 Hans J. Koch wrote:
> > > On Thu, Dec 06, 2012 at 11:11:38AM +0200, Vitalii Demianets wrote:
> > > > On Thursday 06 December 2012 04:41:01 Hans J. Koch wrote:
> > > > > > @@ -63,7 +68,7 @@ static irqreturn_t uio_pdrv_genirq_handler(int
> > > > > > irq,
> > > >
> > > > struct uio_info *dev_info)
> > > >
> > > > > > * remember the state so we can allow user space to enable it
> > > > > > later. */
> > > > > >
> > > > > > - if (!test_and_set_bit(0, &priv->flags))
> > > > > > + if (!test_and_set_bit(UIO_IRQ_DISABLED, &priv->flags))
> > > > > > disable_irq_nosync(irq);
> > > > >
> > > > > That is not safe. That flag can only be changed by userspace, and
> > > > > if the userspace part is not running, the CPU on which the irq is
> > > > > pending will hang.
> > > > > The handler has to disable the interrupt in any case.
> > > > > (The original version had the same problem, I know...)
> > > >
> > > > Perhaps I'm missing something here, but I don't see your point. If
> > > > userspace is not (implicitly) calling uio_pdrv_genirq_irqcontrol() by
> > > > writing to the device file, then on the first invocation of interrupt
> > > > handler that IRQ is disabled by disable_irq_nosync() and that's the
> > > > end of story - no more irqs, no problems. Until userspace writes
> > > > non-zero to the device file, of course.
> > >
> > > If you run into the irq handler, the interrupt was obviously enabled.
> > > Since irq sharing is not supported by this driver, you ALWAYS have to
> > > disable it because it IS your interrupt. Storing the enabled/disabled
> > > status in a variable is not needed at all here. Or am I missing
> > > something?
> >
> > Good point. I agree that having UIO_IRQ_DISABLED flag is redundant.
> > Inside the irq handler we know for sure that irq is enabled. In
> > uio_pdrv_genirq_irqcontrol() this knowledge is not mandatory, we could
> > enable/disable irq unconditionally.
>
> On second thought, we can't call enable_irq()/disable_irq() unconditionally
> because of the potential disable counter (irq_desc->depth) disbalance.
> That's why we need UIO_IRQ_DISABLED flag, and that's why we should check it
> in uio_pdrv_genirq_irqcontrol().
> On the other hand, you are right in that we don't need to check it inside
> irq handler. Inside irq handler we can disable irq and set the flag
> unconditionally, because:
> a) We know for sure that irqs are enabled, because we are inside
> (not-shared) irq handler;
> and
> b) We are guarded from potential race conditions by spin_lock_irqsave()
> call in uio_pdrv_genirq_irqcontrol().
>
> So,yes, we can get rid of costly atomic call to
> test_and_set_bit(UIO_IRQ_DISABLED,..) inside irq handler. But I still don't
> like the idea of mixing this optimization with bug fixes in a single patch.

On the third thought, we can't ;)
Imagine the SMP system where uio_pdrv_genirq_irqcontrol() is being executed on
CPU0 and irq handler is running concurrently on CPU1. To protect from
disable_irq counter disbalance we must first check current irq status, and in
atomic manner. Thus we prevent double-disable, one from
uio_pdrv_genirq_irqcontrol() running on CPU0 and another form irq handler
running on CPU1.
Above consideration justifies current code.

But it seems we have potential concurrency problem here anyway. Here is
theoretical scenario:
1) Initial state: irq is enabled, uio_pdrv_genirq_irqcontrol() starts being
executed on CPU0 with irq_on=1 and at the same time, concurrently, irq
handler starts being executed on CPU1;
2) irq handler executes line
if (!test_and_set_bit(UIO_IRQ_DISABLED, &priv->flags))
as irq was enabled, the condition holds. And now UIO_IRQ_DISABLED is set.
3) uio_pdrv_genirq_irqcontrol() executes line
if (test_and_clear_bit(UIO_IRQ_DISABLED, &priv->flags))
as UIO_IRQ_DISABLED was set by CPU1, the condition holds. And now
UIO_IRQ_DISABLED is cleared.
4) CPU0 executes enable_irq() . IRQ is enabled. "Unbalanced enable for
IRQ %d\n" warning is issued.
5) irq handler on CPU1 executes disable_irq_nosync(). IRQ is disabled.

And now we are in situation where IRQ is disabled, but UIO_IRQ_DISABLED is
cleared. Bad.

The above scenario is purely theoretical, I have no means to check it in
hardware.
The (theoretical) solution is to guard UIO_IRQ_DISABLED test and actual irq
disabling inside irq handler by priv->lock.

What do you think about it? Is it worth worrying about?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/