Re: linux-next: PowerPC boot failures in next-20120521

From: David Rientjes
Date: Mon May 21 2012 - 22:25:04 EST


On Tue, 22 May 2012, Michael Neuling wrote:

> console [tty0] enabled
> console [hvc0] enabled
> pid_max: default: 32768 minimum: 301
> Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
> Mount-cache hash table entries: 4096
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> POWER7 performance monitor hardware support registered
> Unable to handle kernel paging request for data at address 0x00001388
> Faulting instruction address: 0xc00000000014a070
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 NUMA pSeries
> Modules linked in:
> NIP: c00000000014a070 LR: c0000000001978cc CTR: c0000000000b6870
> REGS: c00000007e5836b0 TRAP: 0300 Tainted: G W (3.4.0-rc6-mikey)
> MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 28004022 XER: 02000000
> SOFTE: 1
> CFAR: 00000000000050fc
> DAR: 0000000000001388, DSISR: 40000000
> TASK = c00000007e560000[1] 'swapper/0' THREAD: c00000007e580000 CPU: 0
> GPR00: 0000000000000000 c00000007e583930 c000000000c034d8 00000000000012d0
> GPR04: 0000000000000000 0000000000001380 0000000000000000 0000000000000001
> GPR08: c00000007e0dff60 0000000000000000 c000000000ca05a0 0000000000000000
> GPR12: 0000000028004024 c00000000ff20000 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000001380
> GPR20: 0000000000000001 c000000000e14900 c000000000e148f0 0000000000000001
> GPR24: c000000000c6f378 0000000000000000 0000000000001380 00000000000002aa
> GPR28: 0000000000000000 0000000000000000 c000000000b576b0 c00000007e021200
> NIP [c00000000014a070] .__alloc_pages_nodemask+0xd0/0x910
> LR [c0000000001978cc] .new_slab+0xcc/0x3d0
> Call Trace:
> [c00000007e583930] [c00000007e5839c0] 0xc00000007e5839c0 (unreliable)
> [c00000007e583ac0] [c0000000001978cc] .new_slab+0xcc/0x3d0
> [c00000007e583b70] [c00000000072ae98] .__slab_alloc+0x38c/0x4f8
> [c00000007e583cb0] [c000000000198190] .kmem_cache_alloc_node_trace+0x90/0x260
> [c00000007e583d60] [c000000000a5a404] .numa_init+0x9c/0x188
> [c00000007e583e00] [c00000000000aa30] .do_one_initcall+0x60/0x1e0
> [c00000007e583ec0] [c000000000a40b60] .kernel_init+0x128/0x294
> [c00000007e583f90] [c000000000020788] .kernel_thread+0x54/0x70
> Instruction dump:
> 0b000000 eb1e8000 3b800000 801800a8 2f800000 409e001c 7860efe3 38000000
> 41820008 38000002 787c6fe2 7f9c0378 <e93a0008> 801800a4 3b600000 2fa90000
> ---[ end trace 31fd0ba7d8756002 ]---
>
> Which seems to be this code in __alloc_pages_nodemask
> ---
> /*
> * Check the zones suitable for the gfp_mask contain at least one
> * valid zone. It's possible to have an empty zonelist as a result
> * of GFP_THISNODE and a memoryless node
> */
> if (unlikely(!zonelist->_zonerefs->zone))
> c00000000014a070: e9 3a 00 08 ld r9,8(r26)
> ---
>
> r26 is coming from r5 which is the struct zonelist *zonelist parameter
> to __alloc_pages_nodemask. Having 0000000000001380 in there is clearly
> a bogus pointer.
>
> Bisecting it points to b4cdf91668c27a5a6a5a3ed4234756c042dd8288
> b4cdf91 sched/numa: Implement numa balancer
>
> Trying David's patch just posted doesn't fix it.
>

Hmm, what does CONFIG_DEBUG_VM say?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/