Re: kernel BUG at kernel/exit.c:792!

From: Linus Torvalds
Date: Wed Dec 03 2003 - 10:53:02 EST




On Wed, 3 Dec 2003, Srivatsa Vaddagiri wrote:
>
> I hit a kernel BUG while running some stress tests
> on a SMP machine. Details are below:
>
> Kernel : 2.6.0-test9-bk23 + CPU Hotplug Patch
> Machine : Intel 4-Way SMP box
>
> kernel BUG at kernel/exit.c:792!
> EIP is at next_thread+0x16/0x50
> Call Trace:
> [<c0180328>] get_tid_list+0x58/0x70
> [<c0180524>] proc_task_readdir+0xc4/0x17c
> [<c01658dc>] vfs_readdir+0x5c/0x70
> [<c0165be0>] filldir64+0x0/0x120
> [<c0165d64>] sys_getdents64+0x64/0xa3
> [<c0165be0>] filldir64+0x0/0x120
> [<c0109291>] sysenter_past_esp+0x52/0x71
>
> I suspect this is because when read_lock call in 'get_tid_list'
> returns, the leader_task had exited already. This
> causes the NULL sighand check to fail in the subsequent call
> to 'next_thread' ?

Yup, looks right.

I think the problem is the BUG() itself, not really the caller. So I'd
prefer the fix for this to be to just entirely remove the debug tests
withing that "#ifdef CONFIG_SMP", rather than hide the threads from /proc
when this happens.

Ingo, comments?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/