Re: USB misbehavior causes system hang

From: Pete Zaitcev
Date: Tue Feb 27 2007 - 16:51:55 EST


On Tue, 27 Feb 2007 09:06:21 -0500, Eric Buddington <ebuddington@xxxxxxxxxxx> wrote:

> sd 1:0:0:0: rejecting I/O to offline device
> ...
> SoftDog: Initiating system reboot.

> Now, the USB problem may well be a device or cabling issue, but I
> don't think that this drive failure should trigger a reboot - I assume
> the drive failure is somehow constipating the entire disk I/O system,
> and preventing my softdog-patting script from running.

Have you tried ub? In theory, its threadless design is supposed to
help with just this kind of a problem. Please let me know, I'm very
curous.

However, the main issue here is the OOM with all the dirty data.
We saw that before. For some weird reason, ext3 is especially good
at producing the immense amounts of write-out. Are you on ext3 or
VFAT on that drive?

Please try to find the CPU traces by hitting SysRq-w, SysRq-p. CPU
is looping under a lock somewhere and eventually it cases the watchdog
to trigger. It may be a USB issue, maybe a VM issue. I can't tell
until we get stack traces.

This does not help you to deal with the unreliable drive, I'm afraid,
but it would be a great service if you pinned down the reason of looping.

-- Pete
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/