Re: [REGRESSION] 2.6.24/25: random lockups when accessing externalUSB harddrive

From: Alan Stern
Date: Mon Jun 23 2008 - 14:10:29 EST


On Mon, 23 Jun 2008, Stefan Becker wrote:

> >>> I get random machine lockups when accessing my USB harddrive with
> >>> kernels 2.6.24/25. They don't occur with kernel 2.6.23. During testing I
> >>> figured out that it has something to do with the USB Bluetooth adaptor.
> >>> If I remove it before the testing I don't get any lockups.
> >
> > Does the same problem still occur in 2.6.26-rc7?
>
> Yes.
>
>
> > Does it occur if you rmmod ehci-hcd?
>
> Yes, i.e. it also happens when the external hardrive runs as USB 1.1
> device with 12mpbs.
>
>
> > Machine lockups are awfully hard to debug. Can you get any information
> > at all (like Alt-SysRq-T) when this happens?
>
> SysRq does not work when the machine locks up. I forgot to mention that
> the test machine is a single CPU machine and that the CPU fan starts to
> run full speed when the lockup occurs.
>
> Guessing from the commit returned by git bisect there is a locking
> error, i.e. the CPU runs into a spinlock that is already locked and
> therefore busy loops.

That is certainly possible. But an error like that should affect lots
of different people and computers, not just your one machine.

> > Can you add debugging
> > printk statements to the USB bluetooth driver to try and localize where
> > the hang occurs?
>
> Any suggestions where to start?

Around every place where the driver calls into the core. You might
also want to debug the places where uhci-hcd acquires and releases
spinlocks.

> >>> git bisect resulted in the following bad commit:
> >>>
> >>> e9df41c5c5899259541dc928872cad4d07b82076 is first bad commit
> >>> commit e9df41c5c5899259541dc928872cad4d07b82076
> >>> Author: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
> >>> Date: Wed Aug 8 11:48:02 2007 -0400
> >>>
> >>> USB: make HCDs responsible for managing endpoint queues
> >
> > Knowing this doesn't help much without more information.
>
> Too bad. Each bisect cycle took 2-3 hours and the whole process took me
> 3 days :-( :-(

I didn't mean that your efforts were wasted. They just don't help much
at this point; maybe later on they will be more useful.

> That commit has spinlock changes so I hoped that it would be a good
> starting point. Is there a way to track the locks?

Only what I suggested: Print something in the log whenever a lock is
acquired or released.

> > Do you have any idea why nobody else has reported this sort of problem?
> > Is it reproducible on other machines?
>
> I attached both USB devices to another, newer dual core laptop. I
> couldn't reproduce the problem there, even when I simulated a single CPU
> machine with maxcpus=1.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/