Re: 2.6.36-rc7: NULL pointer dereference in ehci_clear_tt_buffer_complete

From: Alan Stern
Date: Tue Oct 12 2010 - 14:23:38 EST


On Mon, 11 Oct 2010, Stefan Richter wrote:

> Hi,
>
> I have got a monitor with built-in hub to which a keyboard, mouse, and
> card reader are connected. At one occasion when I switched the monitor
> off, the following oops happened.
>
> I have updated from 2.6.36-rc4 to 2.6.36-rc7 on Saturday evening. I.e.
> it may be a regression after 2.6.36-rc4, but isn't necessarily so.

I don't think it's a regression.

> Oct 11 22:29:21 stein kernel: usb 1-1.2: USB disconnect, address 14
> Oct 11 22:29:21 stein kernel: drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-1.3/input0, status -71
> Oct 11 22:29:21 stein kernel: usb 1-1: clear tt 3 (00f0) error -71
> Oct 11 22:29:21 stein kernel: drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-1.3/input1, status -71
> Oct 11 22:29:21 stein kernel: hub 1-1:1.0: hub_port_status failed (err = -71)
> Oct 11 22:29:21 stein kernel: hub 1-1:1.0: connect-debounce failed, port 2 disabled
> Oct 11 22:29:21 stein kernel: usb 1-1: USB disconnect, address 12
> Oct 11 22:29:21 stein kernel: usb 1-1.1: USB disconnect, address 13
> Oct 11 22:29:21 stein kernel: usb 1-1.1.1: USB disconnect, address 16
> Oct 11 22:29:21 stein kernel: usb 1-1: clear tt 3 (00f0) error -71
> Oct 11 22:29:21 stein kernel: usb 1-1.3: USB disconnect, address 15
> Oct 11 22:29:21 stein kernel: ehci_hcd 0000:00:12.2: qh ffff880208f07af0 (#00) state 5
> Oct 11 22:29:21 stein kernel: drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-1.3/input0, status -108
> Oct 11 22:29:21 stein kernel: usb 1-1: clear tt 3 (00f0) error -19
> Oct 11 22:29:21 stein kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
> Oct 11 22:29:21 stein kernel: IP: [<ffffffff8129dbd8>] ehci_clear_tt_buffer_complete+0x31/0x72

Is this reproducible? I'd guess that it happens only a fraction of the
times you turn off the monitor. Maybe it will be more likely to happen
if you are moving the mouse while you turn off the monitor.

At any rate, I'm baffled. This log entry:

ehci_hcd 0000:00:12.2: qh ffff880208f07af0 (#00) state 5

indicates that the ehci-hcd data structures were seriously messed up.
It means the qh was not on the async list at a time when it should have
been (an URB for that qh was completing).

That led to ehci_clear_tt_buffer_complete crashing with qh == NULL,
which is what that IP address means, right? You could add a test for
NULL, but that would merely cover up the symptom: qh should never be
NULL at that point.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/