Re: v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c)

From: Vegard Nossum
Date: Fri Dec 19 2008 - 12:19:56 EST


On Thu, Dec 18, 2008 at 11:06 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
> Hi,
>
> With such a patch:
>
> diff --git a/init/main.c b/init/main.c
> index 7e117a2..2f93119 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -465,6 +465,8 @@ static void noinline __init_refok rest_init(void)
> {
> int pid;
>
> + *(char *) NULL = 0;
> +
> kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
> numa_default_policy();
> pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
>
> ...I would expect a page fault and that's that. So panic() is called,
> but it causes a new page fault somewhere. Here is the log:
>
> (this part is correct and expected)
>
> [ 0.031003] BUG: unable to handle kernel NULL pointer dereference at 00000000
> [ 0.033997] IP: [<c13b448b>] rest_init+0xf/0x57
> [ 0.035997] *pde = 00000000
> [ 0.037289] Oops: 0002 [#1] SMP
> [ 0.037994] last sysfs file:
> [ 0.037994] Modules linked in:
> [ 0.037994]
> [ 0.037994] Pid: 0, comm: swapper Not tainted (2.6.28-rc7 #181) 945P-A
> [ 0.037994] EIP: 0060:[<c13b448b>] EFLAGS: 00010246 CPU: 0
> [ 0.037994] EIP is at rest_init+0xf/0x57
> [ 0.037994] EAX: c16631e3 EBX: 00000040 ECX: 00000a00 EDX: 00000000
> [ 0.037994] ESI: 00099800 EDI: c160a000 EBP: c165dfd0 ESP: c165dfd0
> [ 0.037994] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 0.037994] Process swapper (pid: 0, ti=c165c000 task=c156e334
> task.ti=c165c000)
> [ 0.037994] Stack:
> [ 0.037994] c165dfe0 c16637af c1691768 00000000 c165dff8 c1663080
> 0175a000 00000000
> [ 0.037994] c14ceaae 00020800 01b7e003 00000000
> [ 0.037994] Call Trace:
> [ 0.037994] [<c16637af>] ? start_kernel+0x2a2/0x2a7
> [ 0.037994] [<c1663080>] ? __init_begin+0x80/0x88
> [ 0.037994] Code: 00 8b 43 04 8d 56 04 89 50 04 89 46 04 8d 43 04
> 89 46 08 89 53 04 fe 03 5b 5e 5d c3 55 b9 00 0a 00 00 8
> 9 e5 31 d2 b8 e3 31 66 c1 <c6> 05 00 00 00 00 00 e8 00 df c4 ff b9 00
> 06 00 00 31 d2 b8 af
> [ 0.037994] EIP: [<c13b448b>] rest_init+0xf/0x57 SS:ESP 0068:c165dfd0
> [ 0.038004] ---[ end trace 4eaa2a86a8e2da22 ]---
> [ 0.038998] Kernel panic - not syncing: Attempted to kill the idle task!
>
> And now the unexpected part:
>
> [ 0.039999] Rebooting in 10 seconds..<1>BUG: unable to handle
> kernel NULL pointer dereference at 0000004c
> [ 0.040993] IP: [<c13b41dc>] klist_next+0x10/0x8d
> [ 0.040993] *pde = 00000000
> [ 0.040993] Oops: 0000 [#2] SMP
> [ 0.040993] last sysfs file:
> [ 0.040993] Modules linked in:
> [ 0.040993]
> [ 0.040993] Pid: 0, comm: swapper Tainted: G D (2.6.28-rc7
> #181) 945P-A
> [ 0.040993] EIP: 0060:[<c13b41dc>] EFLAGS: 00010286 CPU: 0
> [ 0.040993] EIP is at klist_next+0x10/0x8d
> [ 0.040993] EAX: 0000003c EBX: c165dd60 ECX: 00000000 EDX: c165dd60
> [ 0.040993] ESI: c165dd60 EDI: 00000000 EBP: c165dd58 ESP: c165dd48
> [ 0.040993] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 0.040993] Process swapper (pid: 0, ti=c165c000 task=c156e334
> task.ti=c165c000)
> [ 0.040993] Stack:
> [ 0.040993] c1026a97 c165dd60 c165dd60 00000000 c165dd74 c11be196
> 0000003c 00000000
> [ 0.040993] c13dbde0 00001078 00000100 c165dd84 c114ed26 c114e458
> c13dbde0 c165dd9c
> [ 0.040993] c1152546 ffffffff c13dbde0 00002710 0000000b c165ddac
> c115259a ffffffff
> [ 0.040993] Call Trace:
> [ 0.040993] [<c1026a97>] ? release_console_sem+0x16c/0x199
> [ 0.040993] [<c11be196>] ? bus_find_device+0x4e/0x6e
> [ 0.040993] [<c114ed26>] ? no_pci_devices+0x17/0x2d
> [ 0.040993] [<c114e458>] ? find_anything+0x0/0xa
> [ 0.040993] [<c1152546>] ? pci_get_subsys+0x15/0x5b
> [ 0.040993] [<c115259a>] ? pci_get_device+0xe/0x10
> [ 0.040993] [<c1013342>] ? mach_reboot_fixups+0x27/0x3c
> [ 0.040993] [<c100fdfb>] ? native_machine_emergency_restart+0x3e/0xd7
> [ 0.040993] [<c100fc60>] ? machine_emergency_restart+0x9/0xb
> [ 0.040993] [<c1032b4b>] ? emergency_restart+0x8/0xa
> [ 0.040993] [<c13cf607>] ? panic+0xb9/0xd6
> [ 0.040993] [<c1028ee0>] ? do_exit+0x5b/0x740
> [ 0.040993] [<c13cf633>] ? printk+0xf/0x11
> [ 0.040993] [<c10263df>] ? print_oops_end_marker+0x1e/0x23
> [ 0.040993] [<c13d1c1e>] ? oops_end+0x7f/0x87
> [ 0.040993] [<c1005a08>] ? die+0x5b/0x63
> [ 0.040993] [<c13d3061>] ? do_page_fault+0x581/0x66f
> [ 0.040993] [<c103a537>] ? sched_clock_cpu+0x136/0x142
> [ 0.040993] [<c103a537>] ? sched_clock_cpu+0x136/0x142
> [ 0.040993] [<c1039165>] ? ktime_get+0x13/0x2f
> [ 0.040993] [<c103a562>] ? sched_clock_idle_sleep_event+0xe/0x10
> [ 0.040993] [<c102a86f>] ? __do_softirq+0x119/0x121
> [ 0.040993] [<c116e282>] ? acpi_hw_low_level_read+0x3b/0x68
> [ 0.040993] [<c116e34f>] ? acpi_hw_register_read+0xa0/0x112
> [ 0.040993] [<c116e4f0>] ? acpi_get_register_unlocked+0x2c/0x48
> [ 0.040993] [<c1161aa7>] ? acpi_os_release_lock+0x8/0xa
> [ 0.040993] [<c116e67b>] ? acpi_get_register+0x2d/0x34
> [ 0.040993] [<c13d2ae0>] ? do_page_fault+0x0/0x66f
> [ 0.040993] [<c13d13da>] ? error_code+0x72/0x78
> [ 0.040993] [<c16631e3>] ? kernel_init+0x0/0x148
> [ 0.040993] [<c13b448b>] ? rest_init+0xf/0x57
> [ 0.040993] [<c16637af>] ? start_kernel+0x2a2/0x2a7
> [ 0.040993] [<c1663080>] ? __init_begin+0x80/0x88
> [ 0.040993] Code: 89 4a 04 74 08 8d 41 0c e8 fa 04 d9 ff 5d c3 55
> 31 c9 89 e5 e8 e0 ff ff ff 5d c3 55 89 e5 57 56 89 c6 5
> 3 83 ec 04 8b 00 8b 7e 04 <8b> 50 10 89 55 f0 e8 7b cf 01 00 85 ff 74
> 23 8b 47 04 ba ec 42
> [ 0.040993] EIP: [<c13b41dc>] klist_next+0x10/0x8d SS:ESP 0068:c165dd48
>
> I know this is not much to fuzz about since it was artificially
> induced with the NULL pointer dereference, but what if such an error
> (a real one) made it into the kernel, it could scroll away the real
> oops. Anyway -- to reproduce, apply the patch and boot with panic=10
> (1 also works). Thanks for the attention,

Reverting:

commit 70308923d317f2ad4973c30d90bb48ae38761317
Author: Greg Kroah-Hartman <gregkh@xxxxxxx>
Date: Wed Feb 13 22:30:39 2008 -0800

PCI: make no_pci_devices() use the pci_bus_type list

no_pci_devices() should use the driver core list of PCI devices, not our
"separate" one.

Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxx>

fixes it.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/