Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL isenabled

From: Frederic Weisbecker
Date: Thu May 16 2013 - 19:14:36 EST

On Thu, May 16, 2013 at 02:32:58PM -0400, Steven Rostedt wrote:
> On Thu, 2013-05-16 at 19:56 +0200, Peter Zijlstra wrote:
> > I suppose the fundamental question was: will receiving NMIs negate NO_HZ_FULL's
> > functionality? That is, will the getting of NMIs make us drop out of NO_HZ_FULL
> > and re-enable all sorts of things?
> It shouldn't. The nmi_enter() notifies RCU that it can no longer ignore
> this CPU, where as nmi_enter() tells it that it can ignore it, as it has
> re-entered user space.
> >
> > Because clearly RCU needs to exit from EQS, which might (or might not) mean
> > leaving NO_HZ_FULL.
> Yep, but the two are pretty much agnostic from each other.
> We only need to leave NO_HZ_FULL if RCU (or anything for that matter)
> required having a tick again. But as Paul said, getting an NMI in idle
> wont restart the tick, so there's no need to restart it here either.
> Now if an NMI were to do a call_rcu() then it would require a tick. But
> NMIs doing call_rcu() has much bigger issues to worry about ;-)

Actually even calling call_rcu() won't restart the tick because the callback
and the grace period lifecycle that come along are handled by the RCU nocb
kthreads. If you have migrated these kthreads accordingly this is handled in the
housekeeping CPU. Of course calling call_rcu() from an NMI involve more problems ;)

In fact we never need to restart the tick for RCU. Even round-trips in the kernel
that are potentially longer than irqs/nmis, such as IO syscalls/exception are
fine because they are either actually short and quickly return to user mode, or they
sleep and go idle so the result is the same: RCU idle mode.

There is just a possible exception that is not yet completely handled: if
a task stays in the kernel too long without sleeping, it may extend a
grace period dangerously (there is no tick to report quiescent states).
In this case we should restart the tick. This is only half implemented
currently: RCU sends IPIs to CPUs that do these excessive grace periods
extensions. Just the CPU that receives that IPI doesn't yet detect the issue
and doesn't restart the tick. That's in the TODO list.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at