Re: 2.6.21-rc7-mm2 -- x86_64 blade hard hangs

From: Mel Gorman
Date: Thu Apr 26 2007 - 13:40:57 EST


On (26/04/07 15:46), Mel Gorman didst pronounce:
> On (26/04/07 14:33), Mel Gorman didst pronounce:
> > On (26/04/07 15:51), Andi Kleen didst pronounce:
> > > Andy Whitcroft <apw@xxxxxxxxxxxx> writes:
> > >
> > > > Getting hard boot failures before any kernel output on an x86_64 blade:
> > > >
> > > > root (hd0,0)
> > > > Filesystem type is ext2fs, partition type 0x83
> > > > kernel /vmlinuz-autobench ro root=/dev/VolGroup00/LogVol00 rhgb
> > > > console=tty0 console=ttyS1,19200 selinux=no autobench_args:
> > > > root=30726124 ABAT:1177507960 profile=2
> > > > [Linux-bzImage, setup=0x1e00, size=0x21fe48]
> > > > initrd /initrd-autobench.img
> > > > [Linux-initrd @ 0x37e5f000, 0x190984 bytes]
> > > > [silence]
> > > >
> > > > Kernel config is here:
> > > >
> > > > http://test.kernel.org/abat/85082/build/dotconfig
> > >
> > > You should boot with earlyprintk=serial,ttyS1,19200
> > >
> >
> > Here is the console log. Haven't taken a closer look yet in case it's
> > blindingly obvious to you.
> >
> > kernel /vmlinuz-autobench ro root=/dev/VolGroup00/LogVol00 rhgb
> > console=tty0 co
> > nsole=ttyS1,19200 selinux=no autobench_args: root=30726124
> > ABAT:1177594149 logl
> > evel=8 earlyprintk=serial,ttyS1,19200
> > [Linux-bzImage, setup=0x1e00, size=0x220da8]
> > initrd /initrd-autobench.img
> > [Linux-initrd @ 0x37e5f000, 0x190982 bytes]
> > Linux version 2.6.21-rc7-mm2-autokern1 (root@xxxxxxxxxxxxxxxxxxxxxxxxx)
> > (gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)) #1 SMP Thu Apr 26
> > 08:16:37 CDT 2007
> > Command line: ro root=/dev/VolGroup00/LogVol00 rhgb console=tty0
> > console=ttyS1,19200 selinux=no autobench_args: root=30726124
> > ABAT:1177594149 loglevel=8 earlyprintk=serial,ttyS1,19200
> > BIOS-provided physical RAM map:
> > BIOS-e820: 0000000000000000 - 000000000009d400 (usable)
> > BIOS-e820: 000000000009d400 - 00000000000a0000 (reserved)
> > BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> > BIOS-e820: 0000000000100000 - 000000003ffcddc0 (usable)
> > BIOS-e820: 000000003ffcddc0 - 000000003ffd0000 (ACPI data)
> > BIOS-e820: 000000003ffd0000 - 0000000040000000 (reserved)
> > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> > end_pfn_map = 1048576
> > kernel direct mapping tables up to 100000000 @ 8000-d000
> > DMI 2.3 present.
> > ACPI: RSDP 000FDFC0, 0014 (r0 IBM )
> > ACPI: RSDT 3FFCFF80, 0034 (r1 IBM SERBLADE 1000 IBM 45444F43)
> > ACPI: FACP 3FFCFEC0, 0084 (r2 IBM SERBLADE 1000 IBM 45444F43)
> > ACPI: DSDT 3FFCDDC0, 1EA6 (r1 IBM SERBLADE 1000 INTL 2002025)
> > ACPI: FACS 3FFCFCC0, 0040
> > ACPI: APIC 3FFCFE00, 009C (r1 IBM SERBLADE 1000 IBM 45444F43)
> > ACPI: SRAT 3FFCFD40, 0098 (r1 IBM SERBLADE 1000 IBM 45444F43)
> > ACPI: HPET 3FFCFD00, 0038 (r1 IBM SERBLADE 1000 IBM 45444F43)
> > SRAT: PXM 0 -> APIC 0 -> Node 0
> > SRAT: PXM 0 -> APIC 1 -> Node 0
> > SRAT: PXM 1 -> APIC 2 -> Node 1
> > SRAT: PXM 1 -> APIC 3 -> Node 1
> > SRAT: Node 0 PXM 0 0-40000000
> > PANIC: early exception rip ffffffff808a7e83 error 0 cr2 5cf0
> > Call Trace:
> > [<ffffffff808a7e83>] reserve_bootmem_node+0x0/0xc
> > [<ffffffff808a3a8c>] reserve_bootmem_generic+0x6d/0x99
> > [<ffffffff8089cc00>] setup_arch+0x2f7/0x469
> > [<ffffffff80896510>] start_kernel+0x64/0x2aa
> > [<ffffffff80896148>] _sinittext+0x148/0x14c
> > RIP reserve_bootmem_node+0x0/0xc
> >
>
> When I looked closer, nothing seemed to make sense so here is another
> log with stack unwinder, frame pointer etc etc
>
> PANIC: early exception rip ffffffff8095a873 error 0 cr2 5cf0
>
> Call Trace:
> [<ffffffff8020b174>] dump_trace+0xb6/0x3d8
> [<ffffffff8020b4d2>] show_trace+0x3c/0x52
> [<ffffffff8020b4fd>] dump_stack+0x15/0x17
> [<ffffffff802001e7>]
> DWARF2 unwinder stuck at 0xffffffff802001e7
> Leftover inexact backtrace:
> [<ffffffff8095a873>] reserve_bootmem_node+0x1/0x12
> [<ffffffff8096114f>] acpi_table_parse+0x4f/0x69
> [<ffffffff80955f91>] reserve_bootmem_generic+0x73/0x9e
> [<ffffffff8094edd3>] setup_arch+0x301/0x49c
> [<ffffffff8094855f>] start_kernel+0x6c/0x30b
> [<ffffffff80948144>] _sinittext+0x144/0x14b
>
> RIP reserve_bootmem_node+0x1/0x12
>
> Will keep kicking.
>

NODE_DATA(0) was returning NULL because of
x86_64-set-node_possible_map-at-runtime.patch. This patch appears to
break acpi_scan_nodes() so that setup_node_bootmem() never gets called and
node_data[] remains full of zeros. Later on, reserve_bootmem_generic() is
called but with NODE_DATA(0) == NULL, it all goes pear shaped. Author of
patch cc'd.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/