Re: [RFC PATCH v1 04/11] sched/idle: make the fast idle path for short idle periods

From: Paul E. McKenney
Date: Wed Jul 12 2017 - 01:03:49 EST


On Wed, Jul 12, 2017 at 11:19:59AM +0800, Li, Aubrey wrote:
> On 2017/7/12 2:11, Paul E. McKenney wrote:
> > On Tue, Jul 11, 2017 at 06:33:55PM +0200, Frederic Weisbecker wrote:
> >> On Tue, Jul 11, 2017 at 05:58:47AM -0700, Paul E. McKenney wrote:
> >>> On Mon, Jul 10, 2017 at 09:38:34AM +0800, Aubrey Li wrote:
> >>>> From: Aubrey Li <aubrey.li@xxxxxxxxxxxxxxx>
> >>>>
> >>>> The system will enter a fast idle loop if the predicted idle period
> >>>> is shorter than the threshold.
> >>>> ---
> >>>> kernel/sched/idle.c | 9 ++++++++-
> >>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> >>>> index cf6c11f..16a766c 100644
> >>>> --- a/kernel/sched/idle.c
> >>>> +++ b/kernel/sched/idle.c
> >>>> @@ -280,6 +280,8 @@ static void cpuidle_generic(void)
> >>>> */
> >>>> static void do_idle(void)
> >>>> {
> >>>> + unsigned int predicted_idle_us;
> >>>> + unsigned int short_idle_threshold = jiffies_to_usecs(1) / 2;
> >>>> /*
> >>>> * If the arch has a polling bit, we maintain an invariant:
> >>>> *
> >>>> @@ -291,7 +293,12 @@ static void do_idle(void)
> >>>>
> >>>> __current_set_polling();
> >>>>
> >>>> - cpuidle_generic();
> >>>> + predicted_idle_us = cpuidle_predict();
> >>>> +
> >>>> + if (likely(predicted_idle_us < short_idle_threshold))
> >>>> + cpuidle_fast();
> >>>
> >>> What if we get here from nohz_full usermode execution? In that
> >>> case, if I remember correctly, the scheduling-clock interrupt
> >>> will still be disabled, and would have to be re-enabled before
> >>> we could safely invoke cpuidle_fast().
> >>>
> >>> Or am I missing something here?
> >>
> >> That's a good point. It's partially ok because if the tick is needed
> >> for something specific, it is not entirely stopped but programmed to that
> >> deadline.
> >>
> >> Now there is some idle specific code when we enter dynticks-idle. See
> >> tick_nohz_start_idle(), tick_nohz_stop_idle(), sched_clock_idle_wakeup_event()
> >> and some subsystems that react differently when we enter dyntick idle
> >> mode (scheduler_tick_max_deferment) so the tick may need a reevaluation.
> >>
> >> For now I'd rather suggest that we treat full nohz as an exception case here
> >> and do:
> >>
> >> if (!tick_nohz_full_cpu(smp_processor_id()) && likely(predicted_idle_us < short_idle_threshold))
> >> cpuidle_fast();
> >>
> >> Ugly but safer!
> >
> > Works for me!
>
> I guess who enabled full nohz(for example the financial guys who need the system
> response as fast as possible) does not like this compromise, ;)

And some HPC guys and some real-time guys with CPU-bound real-time
processing, so there are likely quite a few different views on this
compromise.

> How about add rcu_idle enter/exit back only for full nohz case in fast idle? RCU idle
> is the only risky ops if removing them from fast idle path. Comparing to adding RCU
> idle back, going to normal idle path has more overhead IMHO.

That might work, but I would need to see the actual patch. Frederic
Weisbecker should look at it as well.

Thanx, Paul