Re: 2.6.21-rc4-mm1

From: Andy Whitcroft
Date: Fri Mar 23 2007 - 08:29:35 EST


Andy Whitcroft wrote:
> Con Kolivas wrote:
>> On Friday 23 March 2007 05:17, Andy Whitcroft wrote:
>>> Ok, I have yet a third x86_64 machine is is blowing up with the latest
>>> 2.6.21-rc4-mm1+hotfixes+rsdl-0.32 but working with
>>> 2.6.21-rc4-mm1+hotfixes-RSDL. I have results on various hotfix levels
>>> so I have just fired off a set of tests across the affected machines on
>>> that latest hotfix stack plus the RSDL backout and the results should be
>>> in in the next hour or two.
>>>
>>> I think there is a strong correlation between RSDL and these hangs. Any
>>> suggestions as to the next step.
>> Found a nasty in requeue_task
>> + if (list_empty(old_array->queue + old_prio))
>> + __clear_bit(old_prio, p->array->prio_bitmap);
>>
>> see anything wrong there? I do :P
>>
>> I'll queue that up with the other changes pending and hopefully that will fix
>> your bug.
>
> Tests queued with your rdsl-0.33 patch (I am assuming its in there).
> Will let you know how it looks.

Hmmm, this is good for the original machine (as was 0.32) but not for
either of the other two. I am seeing panics as below on those two.

-apw

elm3b245:

NULL pointer dereference
at 0000000000000020 RIP:
[<ffffffff80497d94>] __sched_text_start+0x424/0x8a5
PGD 0
Oops: 0000 [1] SMP
last sysfs file: block/ram0/uevent
CPU 0
Modules linked in:
Pid: 1038, comm: udevd Not tainted 2.6.21-rc4-mm1-autokern1 #1
RIP: 0010:[<ffffffff80497d94>] [<ffffffff80497d94>]
__sched_text_start+0x424/0x8a5
RSP: 0018:ffff81000316de68 EFLAGS: 00010017
RAX: 00000000000006c6 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000008c RDI: ffffffffffffffd0
RBP: ffff81000316def8 R08: 0000000000000064 R09: 0000000000000024
R10: ffff810001014ad8 R11: 0000000000000286 R12: ffff810001014218
R13: ffff810001013780 R14: ffff810001769450 R15: 0000000000000000
FS: 00002b75d89c66d0(0000) GS:ffffffff805aa000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 0000000000201000 CR4: 00000000000006e0
Process udevd (pid: 1038, threadinfo ffff81000316c000, task
ffff8100031cebb0)
Stack: 0000000000000000 0000000000000001 0000000000000000 ffff8100031cebb0
ffffffffffffffd0 00000036e28ef568 ffff8100031ced48 0000000000000292
ffff81000316def8 0000000000000246 ffff81000316def8 ffffffff8022af3d
Call Trace:
[<ffffffff8022af3d>] put_files_struct+0xbd/0xc9
[<ffffffff8022c773>] do_exit+0x7d2/0x7d6
[<ffffffff8022c801>] sys_exit_group+0x0/0x14
[<ffffffff8022c813>] sys_exit_group+0x12/0x14
[<ffffffff8020968e>] system_call+0x7e/0x83


Code: 48 39 47 50 74 51 48 c7 47 40 00 00 00 00 8b 52 f4 48 b9 40
RIP [<ffffffff80497d94>] __sched_text_start+0x424/0x8a5
RSP <ffff81000316de68>
CR2: 0000000000000020
Fixing recursive fault but reboot is needed!


elm3b6:
Unable to handle kernel paging request at 000000000000fb6c RIP:
[<ffffffff8020c573>] convert_rip_to_linear+0x53/0x91
PGD 180780067 PUD 182242067 PMD 0
Oops: 0000 [1] SMP
last sysfs file:
devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
CPU 0
Modules linked in:
Pid: 2442, comm: autorun Not tainted 2.6.21-rc4-mm1-autokern1 #1
RIP: 0010:[<ffffffff8020c573>] [<ffffffff8020c573>]
convert_rip_to_linear+0x53/0x91
RSP: 0000:ffff810181a53cf8 EFLAGS: 00010002
RAX: 000000000000fb68 RBX: ffff810181a53e28 RCX: ffff8101823d6930
RDX: ffffffff8049fb6d RSI: ffff810182342180 RDI: ffff810182342440
RBP: ffff810181a53cf8 R08: 0000000080209bb9 R09: 000000000000008c
R10: 0000000000000000 R11: 0000000001200011 R12: 0000000000000000
R13: ffff810182342180 R14: ffff810181a53e28 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff805b2000(0063) knlGS:00000000f7f1cb80
CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000000000fb6c CR3: 0000000181a5b000 CR4: 00000000000006e0
Process autorun (pid: 2442, threadinfo ffff810181a52000, task
ffff8101823d6930)
Stack: ffff810181a53d18 ffffffff80219075 ffff8101823d84a8 0000000000000020
ffff810181a53e18 ffffffff80219ab4 ffff8101fff654d8 ffff810181a53d48
ffffffff80264291 ffff8101823d6930 ffff810181a53e28 0000000000000046
Call Trace:
[<ffffffff80219075>] is_prefetch+0x29/0x217
[<ffffffff80219ab4>] do_page_fault+0x608/0x7f0
[<ffffffff80264291>] page_dup_rmap+0x1d/0x24
[<ffffffff8024567c>] search_module_extables+0x83/0x8f
[<ffffffff80229b43>] oops_enter+0xe/0x10
[<ffffffff8020ae62>] oops_begin+0x3c/0x70
[<ffffffff80219b31>] do_page_fault+0x685/0x7f0
[<ffffffff8022404d>] task_running_tick+0xad/0x290
[<ffffffff8049fb6d>] error_exit+0x0/0x84
[<ffffffff8049fb6d>] error_exit+0x0/0x84
[<ffffffff8049dc11>] thread_return+0x22/0xd3
[<ffffffff80209802>] int_careful+0xd/0x11


Code: 8b 48 04 0f b7 50 02 0f b6 c1 c1 e0 10 09 c2 89 c8 25 00 00
RIP [<ffffffff8020c573>] convert_rip_to_linear+0x53/0x91
RSP <ffff810181a53cf8>
CR2: 000000000000fb6c
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/