Re: 2.6.23-rc6: hanging ext3 dbench tests

From: Linus Torvalds
Date: Tue Sep 11 2007 - 12:59:55 EST




On Tue, 11 Sep 2007, Andy Whitcroft wrote:
>
> I have a couple of failed test runs against 2.6.23-rc6 where the
> job timed out while running dbench over ext3. Both on powerpc,
> though both significantly different hardware setups. A failed
> run like this implies that the machine was still responsive to
> other processes but the dbench was making no progress. There is
> no console diagnostics during the failure.

Since the machine seems to be otherwise alive, can you do a sysrq-W (which
is most easily done by just doing a

echo w > /proc/sysrq-trigger

and you don't actually need any console access or anything like that).

That should give you all the blocked process traces, and if it's a
deadlock on some semaphore or other, it should all stand out quite nicely.

In fact, things like the above are probably worth scripting for any
automated testing - if you auto-fail after some time, please make the
failure case do that sysrq-W by default.

(The other sysrq things can be useful too - "T" shows the same as "W",
except for _all_ tasks, which is often so verbose that it hides the
problem, but is sometimes the right thing to do).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/