Re: debug rsdl 0.33

From: Andrew Morton
Date: Sun Mar 25 2007 - 18:05:40 EST


On Sun, 25 Mar 2007 19:28:57 +0100 "Torsten Kaiser" <just.for.lkml@xxxxxxxxxxxxxx> wrote:

> On 3/24/07, Con Kolivas <kernel@xxxxxxxxxxx> wrote:
> > kernel/sched.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 51 insertions(+)
>
> 2.6.21-rc4-mm1 also fails for me.
>
> I tried pure 2.6.21-rc4-mm1, +hotfixes, +hotfixes+rsdl33 and at last
> also added above debug patch.
>
> The oops from with the debug-patch added:
> [ 65.426126] Freeing unused kernel memory: 312k freed
> (on the console the system is starting up, getting until "Letting udev
> process events ...")
> [ 66.665611] Unable to handle kernel NULL pointer dereference at
> 0000000000000020 RIP:
> [ 66.682030] [<ffffffff8026167c>] __sched_text_start+0x4dc/0xa0e
> [ 66.707402] PGD 0
> [ 66.713473] Oops: 0000 [1] SMP
> [ 66.722968] last sysfs file:
> devices/pci0000:00/0000:00:05.0/host2/target2:0:0/2:0:0:0/type
> [ 66.747954] CPU 0
> [ 66.754025] Modules linked in:
> [ 66.763209] Pid: 1200, comm: udevd Not tainted 2.6.21-rc4-mm1 #4
> [ 66.781162] RIP: 0010:[<ffffffff8026167c>] [<ffffffff8026167c>]
> __sched_text_start+0x4dc/0xa0e
> [ 66.807236] RSP: 0018:ffff81007d38fe78 EFLAGS: 00010082
> [ 66.823115] RAX: ffffffffffffffd0 RBX: 000000000000008c RCX: 000000000000058e
> [ 66.844439] RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
> [ 66.865767] RBP: ffff81007d38ff08 R08: 0000000000000064 R09: ffff810001014a58
> [ 66.887092] R10: 000000000000001c R11: 0000000000000246 R12: ffff810001013700
> [ 66.908418] R13: ffff810001014198 R14: 0000000000000001 R15: 0000000f859461fc
> [ 66.929745] FS: 00002b67df90e6d0(0000) GS:ffffffff807aa000(0000)
> knlGS:0000000000000000
> [ 66.953950] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 66.971126] CR2: 0000000000000020 CR3: 0000000000201000 CR4: 00000000000006e0
> [ 66.992451] Process udevd (pid: 1200, threadinfo ffff81007d38e000,
> task ffff81007e354100)
> [ 67.016915] Stack: 00000000000004b0 0000000000000000
> 0000000000000000 ffff81007e354100
> [ 67.041097] ffffffffffffffd0 ffff81007e354298 ffff81011d420680
> ffffffff802234b1
> [ 67.063407] 0000000000000001 0000000000000000 0000000000000000
> 0000000000000246
> [ 67.085149] Call Trace:
> [ 67.093037] [<ffffffff802234b1>] filp_close+0x71/0x90
> [ 67.108397] [<ffffffff80214d97>] do_exit+0x7e7/0x800
> [ 67.123495] [<ffffffff80248372>] do_group_exit+0x82/0x90
> [ 67.139634] [<ffffffff8025c1de>] system_call+0x7e/0x83
> [ 67.155277]
> [ 67.159739]
> [ 67.159740] Code: 48 39 48 50 0f 84 8b 00 00 00 48 c7 40 40 00 00 00 00 8b 52
> [ 67.186877] RIP [<ffffffff8026167c>] __sched_text_start+0x4dc/0xa0e
> [ 67.205919] RSP <ffff81007d38fe78>
> [ 67.216348] CR2: 0000000000000020
> [ 67.226260] Fixing recursive fault but reboot is needed!

We've seen multiple reports of this.

For some reason we've managed to confuse kallsyms too.

> The system in x86_64, two 2218 on a MCP55 nvidia chipset.
>
> 2.6.21-rc3-mm1 works fine.
>
> (gdb) list *0xffffffff8026167c
> 0xffffffff8026167c is in schedule (kernel/sched.c:3619).
> 3614 /*
> 3615 * When the task is chosen it is checked to see if its
> quota has been
> 3616 * added to this runqueue level which is only performed once per
> 3617 * level per major rotation for each running task.
> 3618 */
> 3619 if (next->rotation != rq->prio_rotation) {
> 3620 /* Task has moved during major rotation */
> 3621 task_new_array(next, rq);
> 3622 if (!entitled_slot(next->static_prio, idx))
> 3623 exchange_slot(next, rq);
>
>

Ah, that helps, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/