Re: [bug] Re: [PATCH] - Fix stack overflow for large values of MAX_APICS

From: Jack Steiner
Date: Thu Jun 26 2008 - 18:58:39 EST

Next message: Andrew Morton: "Re: [PATCH 2/3] i/o bandwidth controller infrastructure"
Previous message: Chris Snook: "Re: GFP_ATOMIC page allocation failures."
In reply to: Paul E. McKenney: "Re: [bug] Re: [PATCH] - Fix stack overflow for large values of MAX_APICS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jun 26, 2008 at 03:38:55PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 26, 2008 at 07:41:25AM -0500, Jack Steiner wrote:
> > On Thu, Jun 26, 2008 at 02:32:31PM +0200, Ingo Molnar wrote:
> > >
> > > * Jack Steiner <steiner@xxxxxxx> wrote:
> > >
> > > > >> I added trace code & isolated the hang to a call to
> > > > >> synchronize_rcu(). Usually from netlink_change_ngroups().
> > > > >>
> > > > >> If I boot with "maxcpus=1, it never hangs (obviously) but always fails
> > > > >> to mount /root.
> > > > >>
> > > > >> Next I changed NR_CPUS to 128. I still see random hangs.
> > > > >>
> > > > >>
> > > > >> I'll chase this more tomorrow. Has anyone else seen any failures that might be
> > > > >> related???
> > > > >>
> > > > >>
> > > >
> > > > Is this already fixed? I see a number of patches to this area have been merged
> > > > since the failure occurred.
> > > >
> > > > I added enough hacks to get backtraces on threads at the time a hang occurs.
> > > > show_state() shows 79 "kstopmachine" tasks. Most have one of the following backtraces:
> > > >
> > > >
> > > > <6>kstopmachine R running task 6400 375 369
> > > > ffff8101ad28bd80 ffffffff8068c5c6 ffff8101ad28bb20 0000000000000002
> > > > 0000000000000046 0000000000000000 0000000000002f42 ffff8101ad28c8b8
> > > > ffff8101ad28bb90 ffffffff80254fac 0000000100000000 0000000000000000
> > > > Call Trace:
> > > > [<ffffffff8068c5c6>] ? thread_return+0x4d/0xbd
> > > > [<ffffffff80254fac>] ? __lock_acquire+0x643/0x6ad
> > > > [<ffffffff80212507>] ? sched_clock+0x9/0xc
> > > > [<ffffffff8022cf41>] ? update_curr_rt+0x111/0x11a
> > > > [<ffffffff80212507>] ? sched_clock+0x9/0xc
> > > > [<ffffffff8068c5f2>] ? thread_return+0x79/0xbd
> > > > [<ffffffff8068ef48>] ? _spin_unlock_irq+0x2b/0x37
> > > > [<ffffffff8068c5f2>] ? thread_return+0x79/0xbd
> > > > [<ffffffff80254fac>] ? __lock_acquire+0x643/0x6ad
> > > > [<ffffffff80212507>] ? sched_clock+0x9/0xc
> > > > [<ffffffff8068c806>] wait_for_common+0x150/0x160
> > > > [<ffffffff8068ef48>] ? _spin_unlock_irq+0x2b/0x37
> > > > [<ffffffff80254fac>] ? __lock_acquire+0x643/0x6ad
> > > > [<ffffffff80212507>] ? sched_clock+0x9/0xc
> > > > [<ffffffff8023422b>] ? sys_sched_yield+0x0/0x6e
> > > > [<ffffffff8026736d>] ? stopmachine+0xaf/0xda
> > > > [<ffffffff8020d558>] ? child_rip+0xa/0x12
> > > > [<ffffffff802672be>] ? stopmachine+0x0/0xda
> > > > [<ffffffff8020d54e>] ? child_rip+0x0/0x12
> > > >
> > > > <6>kstopmachine ? 0000000000000000 6400 367 1
> > > > ffff8101af9b9ee0 0000000000000046 0000000000000000 0000000000000000
> > > > 0000000000000000 ffff8101af9b4000 ffff8101afdc0000 ffff8101af9b4540
> > > > 0000000600000000 00000000ffff909f ffffffffffffffff ffffffffffffffff
> > > > Call Trace:
> > > > [<ffffffff8023be98>] do_exit+0x6fe/0x702
> > > > [<ffffffff8020d55f>] child_rip+0x11/0x12
> > > > [<ffffffff802672be>] ? stopmachine+0x0/0xda
> > > > [<ffffffff8020d54e>] ? child_rip+0x0/0x12
> > > >
> > > >
> > > > The boot thread shows:
> > > > <6>swapper D 0000000000000002 2640 1 0
> > > > ffff8101afc3fcd0 0000000000000046 ffffffff807d8341 0000000000000200
> > > > ffffffff807d8335 ffff8101afc40000 ffff8101ad284000 ffff8101afc40540
> > > > 00000005afc3faa0 ffffffff8021e837 ffff8101afc3fab0 ffff8101afc3fd50
> > > >
> > > > [<ffffffff8068c961>] schedule_timeout+0x27/0xb9
> > > > [<ffffffff8068ef48>] ? _spin_unlock_irq+0x2b/0x37
> > > > [<ffffffff8068c79c>] wait_for_common+0xe6/0x160
> > > > [<ffffffff8022d88a>] ? default_wake_function+0x0/0xf
> > > > [<ffffffff8068c8a0>] wait_for_completion+0x18/0x1a
> > > > [<ffffffff8024981a>] synchronize_rcu+0x3a/0x41
> > > > [<ffffffff802498a3>] ? wakeme_after_rcu+0x0/0x15
> > > > [<ffffffff805d8e1b>] netlink_change_ngroups+0xce/0xfc
> > > > [<ffffffff805da2c9>] genl_register_mc_group+0xfd/0x160
> > > > [<ffffffff80ac6d5d>] ? acpi_event_init+0x0/0x57
> > > > [<ffffffff80ac6d92>] acpi_event_init+0x35/0x57
> > > > [<ffffffff80aaca8c>] kernel_init+0x1c5/0x31f
> > > >
> > > >
> > > > Is this hang already fixed or should I dig deeper?
> > >
> > > there's no known hang in tip/master. I.e. removing your MAX_APICS patch
> > > clearly resolved that crash.
> >
> > Hmmm. I'm puzzled. With the tip/master tree that I built earlier this week, I was
> > able to get hangs both with & without the MAX_APICS patch. Although less frequent,
> > I also got hangs with NR_CPUS=128 & without the MAX_APICS patch. I'm not certain
> > that all hangs were identical to the above backtraces, but they all hung at
> > about the same spot.
> >
> > I'll build a new tip/master tree, apply the MAX_APICS patch and retest using
> > your random config & boot options that caused the problem.
>
> Is the kstopmachine related to the synchronize_rcu()? A task that is
> running in a tight loop with irqs disabled (as stopmachine() seems to do,
> judging from the source code) will absolutely prevent RCU grace periods
> from completing.
>
> So if the stop_machine() sequence gets stuck for whatever reason, RCU
> grace periods will get stuck as well.

I'm still debugging. (I'm learning a lot about debugging x86 systems - up to now
I've worked mostly with IA64). Stopmachine is definitely part of the problem.

The failure mode is not always the same. In the most recent failure, RCU
is not involved. The hang in stopmachine().

I see the following task spinning on cpu 7 at the time of the hang:

HUNG: registers 7
CPU 7:
Modules linked in:
Pid: 259, comm: kstopmachine Not tainted 2.6.26-rc8-tip-bug3 #8
Stime: 31957, LastSwitch 0, SwitchCount 0
RIP: 0010:[<ffffffff80688368>] [<ffffffff80688368>] _spin_unlock_irq+0x33/0x37
RSP: 0000:ffff8101af50dc40 EFLAGS: 00000282
RAX: 0000000000000007 RBX: ffff8101af50dc50 RCX: 0000000000000000
RDX: 000000000000dee9 RSI: 00000000000002c1 RDI: 0000000000000000
RBP: ffff8101af508870 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8101af508870
R13: 0000000000000002 R14: ffff8101af50dbc0 R15: ffffffff8022ccb1
FS: 0000000000000000(0000) GS:ffff8101afd226e0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
[<ffffffff80685a0a>] ? thread_return+0x79/0xbd
[<ffffffff80254d74>] ? __lock_acquire+0x643/0x6ad
[<ffffffff802125af>] ? sched_clock+0x9/0xc
[<ffffffff80685c1e>] ? wait_for_common+0x150/0x160
[<ffffffff80688360>] ? _spin_unlock_irq+0x2b/0x37
[<ffffffff80254d74>] ? __lock_acquire+0x643/0x6ad
[<ffffffff802125af>] ? sched_clock+0x9/0xc
[<ffffffff80233f9f>] ? sys_sched_yield+0x0/0x6e
[<ffffffff80234007>] ? sys_sched_yield+0x68/0x6e
[<ffffffff80685cd1>] ? yield+0x17/0x19
[<ffffffff80267133>] ? stopmachine+0xad/0xda
[<ffffffff8020d618>] ? child_rip+0xa/0x12
[<ffffffff80267086>] ? stopmachine+0x0/0xda
[<ffffffff8020d60e>] ? child_rip+0x0/0x12

Another task is:

kstopmachine R running task 6016 251 2 2
ffff8101af53dec0 0000000000000046 00000000000000fd ffff8101af538870
ffff8101af53dcb0 ffff8101af538000 ffff8101afcac000 ffff8101af538548
0000000200000000 ffff8101af538000 ffff8101af53dc90 0000000000000046
Call Trace:
[<ffffffff802125af>] ? sched_clock+0x9/0xc
[<ffffffff8023523d>] ? wake_up_new_task+0xef/0xf8
[<ffffffff8023523d>] ? wake_up_new_task+0xef/0xf8
[<ffffffff80237301>] ? do_fork+0x1b8/0x284
[<ffffffff80254d74>] ? __lock_acquire+0x643/0x6ad
[<ffffffff802125af>] ? sched_clock+0x9/0xc
[<ffffffff80233f9f>] ? sys_sched_yield+0x0/0x6e
[<ffffffff80234007>] sys_sched_yield+0x68/0x6e
[<ffffffff80685cd1>] yield+0x17/0x19
[<ffffffff80266fd3>] do_stop+0x87/0x13a
[<ffffffff80266f4c>] ? do_stop+0x0/0x13a
[<ffffffff80266f4c>] ? do_stop+0x0/0x13a
[<ffffffff8024b7fe>] kthread+0x4e/0x7c
[<ffffffff8020d618>] child_rip+0xa/0x12
[<ffffffff8024b7b0>] ? kthread+0x0/0x7c
[<ffffffff8020d60e>] ? child_rip+0x0/0x12

AFAICT, it is spinning in stopmachine() in the following loop:

while (stopmachine_state != STOPMACHINE_EXIT)

stopmachine_state = 0
stopmachine_num_threads = 8
stopmachine_thread_ack = 7

Still looking. Should know more tomorrow.

--- jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: [PATCH 2/3] i/o bandwidth controller infrastructure"
Previous message: Chris Snook: "Re: GFP_ATOMIC page allocation failures."
In reply to: Paul E. McKenney: "Re: [bug] Re: [PATCH] - Fix stack overflow for large values of MAX_APICS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]