Re: recursive fault in 2.6.35.5

From: Mike Galbraith
Date: Sun May 29 2011 - 22:48:37 EST


On Sun, 2011-05-29 at 12:27 -0400, Whit Blauvelt wrote:
> Hi,
>
> This isn't a most-recent kernel, so we should upgrade the systems with it,
> but it could also be useful to know why the fault occurred. If someone here
> can easily decode the final messages when the system froze....
>
> This is vanilla 2.6.35.5, built from source, running with Ubuntu Server
> 10.04.2. Two similar systems have been running stably for months, then
> yesterday and today both froze up - one twice. On the one where I was able
> to get a remote console before rebooting the final messages are in a screen
> capture at
>
> http://www.transpect.com/jpg/sb2crash.jpg
>
> The final lines are
>
> [3521437.065988] RIP [<ffffffff81054ddc>] set_next_entity+0xc/0xa0
> [3521437.065993] RSP <ffff8801b60b1748>
> [3521437.065994] CR2: 0000000000000038
> [3521437.065997] ---[ end trace 5a40c5f226029029 ]---
> [3521437.065999] Fixing recursive fault but reboot is needed!
>
> These are basically file servers running NFS, samba, and some Python. I know
> there are recent improvements to the kernel's NFS functions. Does this point
> in that direction as the cause of the recursive fault?

No, you've been bitten by an annoyingly elusive load balancing bug.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/