Re: amd5536udc interrupts bug

From: Robert Hancock
Date: Wed Jan 07 2009 - 21:33:19 EST


Vadim Lobanov wrote:
Hello,

From perusing the code and playing with the module, it seems to me that the amd5536udc driver's handling of interrupts is currently "bustificated".

The long story:

During the amd5536udc initialization sequence within udc_pci_probe(), the code attempts to request a shared irq for the device thusly:
request_irq(pdev->irq, udc_irq, IRQF_SHARED, name, dev)
where 'dev' is the internal state structure. By the time this call is made, the 'dev' structure is still not fully initialized and contains blank/zero data for many of the fields, in particular both 'dev->lock' and 'dev->regs' which are both clearly used within the udc_irq() handler. Those get initialized a bit later, namely inside the udc_probe() call at the bottom of udc_pci_probe().

It is my understanding that a handler for a shared interrupt can be invoked at any time after the corresponding request_irq() call is made, simply because some other device on the same interrupt may already be active. This leaves us with a very noticeable race condition, where udc_irq() can be invoked with uninitialized 'dev' data.

Yes, your analysis appears correct.


In particular, this effect is very noticeable when I try modprobing the amd5536udc driver on a kernel that is built with CONFIG_DEBUG_SHIRQ set. Given that the debug config option forces an invocation of the irq handler from within the request_irq() function, the immediate effect is a masterful kernel NULL-dereference OOPS within udc_irq().

The simple fix may be to say that amd5536udc does not support shared irqs, and to remove the IRQF_SHARED flag from the request_irq() call. A

That will bust any other hardware that tries to share the interrupt. If a driver requests the interrupt without IRQF_SHARED, nothing else can request that interrupt line.

more complicated fix may be to try to shuffle all the code around to make sure that 'dev' is fully initialized before we request the irq, but I don't understand the code well enough (yet) to comfortably do this.

Yeah, that's the proper fix.


Comments? Thoughts?

On a side note, it occurs to me that the CONFIG_DEBUG_SHIRQ option went into the kernel a bit less than two years ago, and that it exposes a very immediate and reproducible OOPS in this driver. Does this mean that noone uses the 5536 UDC functionality with any recent kernels? Should I be worried? :)

Presumably nobody uses it with CONFIG_DEBUG_SHIRQ, that option wouldn't normally be used on non-debug kernels..


-- Vadim Lobanov


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/