Re: Hang in 2.6.23-rc5

From: Jesper Juhl
Date: Sun Sep 02 2007 - 18:18:24 EST


On Monday 03 September 2007 00:05:15 Richard Mittendorfer wrote:
> Also sprach "Alessandro Suardi" <alessandro.suardi@xxxxxxxxx> (Sun, 2 Sep 2007 22:38:28 +0200):
> > On 9/2/07, charles gagalac <charles.gagalac@xxxxxxxxx> wrote:
> > > On 9/2/07, daryll q wrote:
> > > > Upgraded my kernel from 2.6.23-rc2 to 2.6.23-rc5.
> > > >
> > > > System hangs (caps lock and scroll lock leds are both flashing).
> [...]
> > > i experienced hangs, with the flashing caps and scroll locks as you've
> > > described, in a few of my later pulls prior to rc5. i couldn't
> > > reproduce the hangs and my logs didn't show evidence of a problem. my
> > > system under rc5, so far, hasn't hung on me.
> [...]
> > Oh, I thought I was the only one. I also had a single hang+flashing
> > Caps & Scroll Lock with -rc5, but haven't had one since.
> [...]
>
> Hmm, just occured here, no chance to capture anything. Happend under
> some system and network load (distcc/nfs) (latest atheros/madwifi
> tainted however, but never had troubles). Not had much uptime with pre-5
> -rc's.
>
> Anything I can help to debug this?
>

First of all, try this patch :

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -560,7 +560,7 @@ static u32 tcp_rto_min(struct sock *sk)
        struct dst_entry *dst = __sk_dst_get(sk);
        u32 rto_min = TCP_RTO_MIN;
 
-       if (dst_metric_locked(dst, RTAX_RTO_MIN))
+       if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
                rto_min = dst->metrics[RTAX_RTO_MIN-1];
        return rto_min;
 }

If that doesn't help, then setup netconsole or serial console and
try to capture some output from the hang.
(details on how to setup net & serial consoles can be found in
Documentation/networking/netconsole.txt
and
Documentation/serial-console.txt
)
Make sure you've set your console loglevel high enough to log
everything.

Also try enabling sysrq in your kernel and, if possible, capture a
sysrq+t dump when the crash happens and send in the dmesg output
after sysrq+t - details in Documentation/sysrq.txt - there's also
info on console log level in there.

You can also try building a kernel with most (or all) of the debug
options found in the 'Kernel hacking' menu enabled. That can often
help by producing extra valuable debug output (you need to be able
to capture it though, so getting net/serial console setup as well
is usually a good idea if the box hangs completely and you can't
just get info by running dmesg).


Kind regards,
Jesper Juhl


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/