Re: Maybe a bug in kernel/irq/chip.c unmask_irq(), device IRQ masked unexpectedly. (re-formated the mail body, sorry)

From: Thomas Gleixner
Date: Mon Oct 14 2019 - 08:34:51 EST


On Tue, 8 Oct 2019, Yi Zheng wrote:
> There is some defects on IRQ processing:
>
> (1) At the beginning of handle_level_irq(), the IRQ-28 is masked, and ACK
> action is executed: On my machine, it runs the 'else' branch:
>
> static inline void mask_ack_irq(struct irq_desc *desc)
> {
> if (desc->irq_data.chip->irq_mask_ack) {
> desc->irq_data.chip->irq_mask_ack(&desc->irq_data);
> irq_state_set_masked(desc);
> } else {
> mask_irq(desc);
> if (desc->irq_data.chip->irq_ack)
> desc->irq_data.chip->irq_ack(&desc->irq_data);
> }
> }
>
> It is an 2-steps procedure:
> 1. mask_irq()
> 2. desc->irq_data.chip->irq_ack()
>
> the 2nd step, the function ptr is omap_mask_ack_irq(), which
> _MASK_ the hardware INTC-IRQ-32 and then do the real ACK action.

Sure. Where is the problem?

> (2) mask_irq()/unmask_irq() are not atomic actions: They check the
> IRQD_IRQ_MASKED flag firstly, and then mask/unmask the irq by calling
> the function ptrs which installed by irq controller drv. Then, those 2
> functions set/clear the IRQD_IRQ_MASKED flag.
>
> I think the sequence of the hw/sw action should be mirrored reversed:
> mask_irq():
> check IRQD_IRQ_MASKED;
> set hardware IRQ mask register;
> set software IRQD_IRQ_MASKED flag;
>
> unmask_irq():
> check IRQD_IRQ_MASKED;
> /* NOTE: should before the hw unmask action!! */
> clear software IRQD_IRQ_MASKED flag;
> clear hardware IRQ mask register;
>
> The current unmask_irq(), hw-mask action runs before sw-mask action,
> which gives an very small time window. That cause an unexpected
> iterated IRQ.

It's completely irrelevant because _ALL_ those operations run with
irq_desc->lock held. So nothing can actually observe that state.

> Here is my the detail of my analyzing of handle_level_irq():
>
> (1) Let record the HW-IRQ-Controller Status and the SW-Flag IRQD_IRQ_MASKED
> pair as following: (hw-mask, sw-mask).
>
> (2) In the 1st level of IRQ-28 ISR calling, in unmask_irq(), after the HW
> unmask action, and before the sw-flag IRQD_IRQ_MASKED is cleared, there
> is a VERY SMALL TIME WINDOW, in which, another IRQ-28 may triggered.
>
> In that time window, the mask status is (0, 1), which is no an valid
> value.

Again. Irrelevant because not observable.

> My fixup is in the attachment, which remove the unexpected time window of
> IRQ iteration.

Please don't send attachments. See Documentation/process/submitting-patches.rst

Thanks,

tglx