Re: 3.9.2: xfstests triggered panic

From: Dave Chinner
Date: Wed May 22 2013 - 23:46:23 EST


On Wed, May 22, 2013 at 11:16:56PM -0400, CAI Qian wrote:
> ----- Original Message -----
> > From: "Dave Chinner" <david@xxxxxxxxxxxxx>
> > To: "CAI Qian" <caiqian@xxxxxxxxxx>
> > Cc: "LKML" <linux-kernel@xxxxxxxxxxxxxxx>, stable@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
> > Sent: Wednesday, May 22, 2013 5:53:00 PM
> > Subject: Re: 3.9.2: xfstests triggered panic
> >
> > On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
> > > Reproduced on almost all s390x guests by running xfstests.
> > >
> > > 14634.396658Â XFS (dm-1): Mounting Filesystem
> > > 14634.525522Â XFS (dm-1): Ending clean mount
> > > 14640.413007Â <000000000017c6d4>Â idle_balance+0x1a0/0x340
> > > 14640.413010Â <000000000063303e>Â __schedule+0xa22/0xaf0
> > > 14640.428279Â <0000000000630da6>Â schedule_timeout+0x186/0x2c0
> > > 14640.428289Â <00000000001cf864>Â rcu_gp_kthread+0x1bc/0x298
> > > 14640.428300Â <0000000000158c5a>Â kthread+0xe6/0xec
> > > 14640.428304Â <0000000000634de6>Â kernel_thread_starter+0x6/0xc
> > > 14640.428308Â <0000000000634de0>Â kernel_thread_starter+0x0/0xc
> > > 14640.428311Â Last Breaking-Event-Address:
> > > 14640.428314Â <000000000016bd76>Â walk_tg_tree_from+0x3a/0xf4
> > > 14640.428319Â list_add corruption. next->prev should be prev
> > > (0000000000000918
> > > ), but was (null). (next= (null)).
> >
> > Where's XFS in this? walk_tg_tree_from() is part of the scheduler
> > code. This kind of implies a stack corruption....
> >
> > > Sometimes, this pops up,
> > > [16907.275002] WARNING: at kernel/rcutree.c:1960
> > >
> > > or this,
> > > 15316.154171Â XFS (dm-1): Mounting Filesystem
> > > 15316.255796Â XFS (dm-1): Ending clean mount
> > > 15320.364246Â 00000000006367a2: e310b0080004 lg
> > > %r1,8(%r
> > > 11)
> > > 15320.364249Â 00000000006367a8: 41101010 la
> > > %r1,16(%
> > > r1)
> > > 15320.364251Â 00000000006367ac: e33010000004 lg
> > > %r3,0(%r
> > > 1)
> > > 15320.364252Â Call Trace:
> > > 15320.364252Â Last Breaking-Event-Address:
> > > 15320.364253Â ï <0000000000000000>Â Kernel stack overflow.
> > > 15320.364308Â CPU: 0 Tainted: GF W 3.9.2 #1
> > > 15320.364309Â Process rhts-test-runne (pid: 625, task: 000000003dccc890,
> > > ksp: 0
> >
> > .... and there you go - a stack overflow. Your kernel stack size is
> > too small.
> >
> > I'd suggest that you need 16k stacks on s390 - IIRC every function
> > call has 128 byte stack frame, and there are call chains 70-80
> > functions deep in the storage stack...
> Hmm, I am unsure how to set to 16k stack there

Are you build a 64 bit s390 kernel or a 32 bit kernel? 32 bit
kernels only have an 8k stack size, 64 bit kernels are 16k (see
arch/s390/Makefile).

$ git grep STACK_SIZE arch/s390 |head -2
arch/s390/Makefile:STACK_SIZE := 8192
arch/s390/Makefile:STACK_SIZE := 16384

As it is, the stack frame usage is worse than I thought:

$ git grep STACK_FRAME_OVERHEAD arch/s390 |head -2
arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 96 /* size of minimum stack frame */
arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 160 /* size of minimum stack frame */

Overhead is 96 bytes for 32 bit and 160 bytes for 64 bit. So 16k
stack size is going to have big troubles with a 70-80 function deep
call chain.

As for powerpc:

arch/powerpc/include/asm/ppc_asm.h:#define STACKFRAMESIZE 256

Yeah, same issue.

But, seriously, these stack traces are meaningless to anyone not
familiar with s390 or power7 - they indicate a problem detected
in the idle loop, not where ever the stack overran.

Can you please work with the s390/power7 people to obtain whatever
stack it was that overflowed, and we can go from there.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/