Re: [PATCH RFC nohz_full 2/7] nohz_full: Add rcu_dyntick data forscalable detection of all-idle state

From: Paul E. McKenney
Date: Tue Jul 09 2013 - 09:58:45 EST


On Tue, Jul 09, 2013 at 11:37:28AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 08, 2013 at 06:30:01PM -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> >
> > This commit adds fields to the rcu_dyntick structure that are used to
> > detect idle CPUs. These new fields differ from the existing ones in
> > that the existing ones consider a CPU executing in user mode to be idle,
> > where the new ones consider CPUs executing in user mode to be busy.
> > The handling of these new fields is otherwise quite similar to that for
> > the exiting fields. This commit also adds the initialization required
> > for these fields.
> >
> > So, why is usermode execution treated differently, with RCU considering
> > it a quiescent state equivalent to idle, while in contrast the new
> > full-system idle state detection considers usermode execution to be
> > non-idle?
> >
> > It turns out that although one of RCU's quiescent states is usermode
> > execution, it is not a full-system idle state. This is because the
> > purpose of the full-system idle state is not RCU, but rather determining
> > when accurate timekeeping can safely be disabled. Whenever accurate
> > timekeeping is required in a CONFIG_NO_HZ_FULL kernel, at least one
> > CPU must keep the scheduling-clock tick going. If even one CPU is
> > executing in user mode, accurate timekeeping is requires, particularly for
> > architectures where gettimeofday() and friends do not enter the kernel.
> > Only when all CPUs are really and truly idle can accurate timekeeping be
> > disabled, allowing all CPUs to turn off the scheduling clock interrupt,
> > thus greatly improving energy efficiency.
> >
> > This naturally raises the question "Why is this code in RCU rather than in
> > timekeeping?", and the answer is that RCU has the data and infrastructure
> > to efficiently make this determination.
>
> but but but but... why doesn't the regular nohz code qualify? I'd think
> that too would be tracking pretty much the same things, no?

The regular nohz code is identifying which CPUs are idle, but is doing
so on a CPU-by-CPU basis. Before turning off system-wide timekeeping,
we need to know that -all- of the CPUs are idle. The regular nohz code
does not do this.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/