Re: [PATCH tip/core/rcu 6/7] rcu: Drive quiescent-state-forcingdelay from HZ

From: Paul E. McKenney
Date: Thu May 16 2013 - 09:14:55 EST


On Thu, May 16, 2013 at 11:37:40AM +0200, Peter Zijlstra wrote:
> On Wed, May 15, 2013 at 09:37:00AM -0700, Paul E. McKenney wrote:
> > The need is to detect that an idle CPU is idle without making it do
> > anything. To do otherwise would kill battery lifetime and introduce
> > OS jitter.
>
> Not anything isn't leaving us much room to wriggle, we could maybe try and do a
> wee bit without people shooting us :-) In fact, looking at rcu_idle_enter()
> its very much not an empty function.

That said, it operates on CPU-local variables, so the first idle/nonidle
transition after an FQS scan is expensive, but subsequent transitions
will not incur communications cache misses.

> > This other CPU must be able to correctly detect idle CPUs regardless of
> > how long they have been idle. In particular, it is necessary to detect
> > CPUs that were idle at the start of the current grace period and have
> > remained idle throughout the entirety of the current grace period.
>
> OK, so continuing this hypothetical merry go round :-)
>
> Since RCU is a global endeavour, I'm assuming there is a global GP sequence
> number. Could we not stamp the CPU with the current GP# in rcu_idle_enter().

We could do that. But we would still need to store it surrounded by
memory barriers, and we would still need to scan it every grace period
to which the CPU did not otherwise repond.

This would get rid of atomic-instruction overhead, but the atomic part
of the increment is on my list to eliminate in any case.

Furthermore, if we stamp the CPU with the last grace period during which
it was non-idle (which needs to be the case for the non-idle-to-idle
transition), we cannot tell whether or not that CPU went idle during
the current grace period if it was non-idle during that grace period.
In contrast, the current scheme can detect arbitrarily short idle
sojourns, regardless of the current state of the CPU.

> > A CPU might transition between idle and non-idle states at any time.
> > Therefore, if RCU collects a given CPU's idleness state during a given
> > grace period, it must be very careful to avoid relying on that state
> > during some other grace period.
>
> However, if we know during which GP it became idle, we know we can ignore it
> for all GPs thereafter, right?

Yes, but as noted above, we wouldn't know to ignore it during the GP during
which it became idle, which is quite important -- many workloads have
short idle sojourns, e.g., due to interrupts arriving at an otherwise
idle CPU.

> > Therefore, from what I can see, unless all CPUs explicitly report a
> > quiescent state in a timely fashion during a given grace period (in
> > which case each CPU was non-idle at some point during that grace period),
> > there is no alternative to polling RCU's per-CPU rcu_dynticks structures
> > during that grace period. In particular, if at least one CPU remained
> > idle throughout that grace period, it will be necessary to poll.
>
> Agreed..
>
> > Of course, during boot time, there are often long time periods during
> > which at least one CPU remains idle. Therefore, we can expect many
> > boot-time grace periods to delay for at least one FQS time period.
> >
> > OK, so how much delay does this cause?
>
> Oh, I'm so way past that, it is a neat puzzle by now ;-)

I can identify with that feeling! ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/