Re: [tip:timers/nohz] nohz: Move full nohz kick to its own IPI

From: Frederic Weisbecker
Date: Mon May 05 2014 - 11:04:11 EST


On Mon, May 05, 2014 at 03:31:13PM +0200, Peter Zijlstra wrote:
> On Mon, May 05, 2014 at 02:37:06PM +0200, Peter Zijlstra wrote:
> > On Wed, Apr 16, 2014 at 12:40:01AM -0700, tip-bot for Frederic Weisbecker wrote:
> > > Commit-ID: 72aacf0259bb7d53b7a3b5b2f7bf982acaa52b61
> > > Gitweb: http://git.kernel.org/tip/72aacf0259bb7d53b7a3b5b2f7bf982acaa52b61
> > > Author: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> > > AuthorDate: Tue, 18 Mar 2014 21:12:53 +0100
> > > Committer: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> > > CommitDate: Thu, 3 Apr 2014 18:05:21 +0200
> > >
> > > nohz: Move full nohz kick to its own IPI
> > >
> > > Now that we have smp_queue_function_single() which can be used to
> > > safely queue IPIs when interrupts are disabled and without worrying
> > > about concurrent callers, lets use it for the full dynticks kick to
> > > notify a CPU that it's exiting single task mode.
> > >
> > > This unbloats a bit the scheduler IPI that the nohz code was abusing
> > > for its cool "callable anywhere/anytime" properties.
> > >
> > > Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > > Cc: Jens Axboe <axboe@xxxxxx>
> > > Cc: Kevin Hilman <khilman@xxxxxxxxxx>
> > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > > Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> >
> > So I suspect this is the patch that makes Ingo's machines unhappy, they
> > appear to get stuck thusly:
> >
> > [10513.382910] RIP: 0010:[<ffffffff8112b7da>] [<ffffffff8112b7da>] generic_exec_single+0x9a/0x180
> >
> > [10513.481704] [<ffffffff8112c092>] smp_queue_function_single+0x42/0xa0
> > [10513.488251] [<ffffffff81126ce0>] tick_nohz_full_kick_cpu+0x50/0x80
> > [10513.494661] [<ffffffff810f4b0e>] enqueue_task_fair+0x59e/0x6c0
> > [10513.506469] [<ffffffff810e3d6a>] enqueue_task+0x3a/0x60
> > [10513.511836] [<ffffffff810e8ac3>] __migrate_task+0x123/0x150
> > [10513.523535] [<ffffffff810e8b0d>] migration_cpu_stop+0x1d/0x30
> > [10513.529401] [<ffffffff81143460>] cpu_stopper_thread+0x70/0x120
> >
> > I'm not entirely sure how yet, but this is by far the most likely
> > candidate. Ingo, if you still have the vmlinuz matching this trace (your
> > hang2.txt) could you have a peek where that RIP lands?
> >
> > If that is indeed the csd_lock() function, then this is it and
> > something's buggered.
>
> On a kernel build from your .config the +0x9a is indeed very close to
> that wait loop; of course 0x9a isn't even an instruction boundary for me
> so its all a bit of a guess.

Note the current ordering:

cmpxchg(&qsd->pending, 0, 1) get ipi
csd_lock(qsd->csd) xchg(&qsd->pending, 1)
send ipi csd_unlock(qsd->csd)


So there shouldn't be racing updaters. Also ipi sender shouldn't
race with ipi receiver, the update shouldn't always eventually see
the unlock happening.

OTOH the ordering above is anything but intuitive. In fact csd_lock()
is on the way here. smp_queue_function_single() doesn't need it at
all. So it's just yet another risk for a deadlock.

One more reason why I would much prefer irq_work_queue_on(). But
I'm also not very happy with the possible result since we are going
to maintain two different APIs set doing almost the same thing with
just slightly different constraints or semantics :-(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/