Re: [PATCH v3 2/2] sched: consider missed ticks when updating global cpu load

From: Peter Zijlstra
Date: Mon Oct 05 2015 - 04:16:19 EST


On Sun, Oct 04, 2015 at 03:58:19PM +0900, Byungchul Park wrote:
> On Fri, Oct 02, 2015 at 05:59:06PM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 02, 2015 at 04:46:14PM +0900, byungchul.park@xxxxxxx wrote:
> > > From: Byungchul Park <byungchul.park@xxxxxxx>
> > >
> > > in hrtimer_interrupt(), the first tick_program_event() can be failed
> > > because the next timer could be already expired due to,
> > > (see the comment in hrtimer_interrupt())
> > >
> > > - tracing
> > > - long lasting callbacks
> >
> > If anything keeps interrupts disabled for longer than 1 tick, you'd
> > better go fix that.
>
> tracing and long lasting callbacks are not my case.
>
> >
> > > - being scheduled away when running in a VM
>
> this is my case.

I was afraid of that :/

> >
> > Not sure how much I should care about that, and this patch is completely
> > wrong for that anyhow.
>
> do you mean, in upper case, we must assume ticks occur with 1 tick interval
> when update_process_times() is called even though more than 1 tick is
> actually passed in host? right? i am really wondering it. i would admit i
> was wrong if what you mean is same as what i understand.

The first part is that I despise virt ;-), the second part is the
below argument that just fixing up some leaf function is wrong.

This check in the timer interrupt handler is purely a fail safe against
horrid behaviour and if you hit this you've lost.

> > And this case in hrtimer_interrupt() is basically a fail case, if you
> > hit that, you've got bigger problems. The solution is to rework things
> > so you don't get there.
> >
> >
> > > in the case that the first tick_program_event() is failed, the second
> > > tick_program_event() set the expired time to more than one tick later.
> > > then next tick can happen after more than one tick, even though tick is
> > > not stopped by e.g. NOHZ.
> > >
> > > when the next tick occurs, update_process_times() -> scheduler_tick()
> > > -> update_cpu_load_active() is performed, assuming the distance between
> > > last tick and current tick is 1 tick! it's wrong in this case. thus,
> > > this abnormal case should be considered in update_cpu_load_active().
> >
> > Everything in update_process_times() assumes 1 tick, just fixing up
> > one function inside that callchain is wrong -- I've already told you
> > that.
>
> anyway, it's wrong for update_process_times() to assume 1 tick because
> tick_irq_exit() -> tick_nohz_irq_exit() -> tick_nohz_full_update_tick()
> -> tick_nohz_restart_sched_tick() can happen at full NOHZ as i already
> said. in this full NOHZ case for tick to restart from non-idle,

NO_HZ_FULL is very much a work in progress, there's plenty wrong with
it. But yes, if it does this then its broken here too, I'm not sure if
Frederic is aware of this or not (I'm sure he's got a fairly big list of
broken for NO_HZ_FULL).

> 1. update_process_times() -> account_process_tick() must be able to handle
> more than one tick, or tick_nohz_restart_sched_tick() should handle the
> case additionally. (i think the latter is better.) i will try to modify
> the code to handle it if you agree with me.

Yes, and we need to audit all the other stuff called from
update_process_times().

run_local_timers() seems be ok.
rcu_check_clalbacks() also doesn't seem to care about ticks.

I _think_ we fixed most of the scheduler_tick()
stuff (under the assumption that TSC is stable), but I'm not sure.

and run_posix_cpu_timers() might also be ok.

> 2. to handle full NOHZ, tick_nohz_restart_sched_tick() should call
> update_cpu_load_active() instead of update_cpu_load_nohz() with my 1/2
> patch and 2/2 patch, or we should modify update_cpu_load_nohz() to know
> full NOHZ, which currently don't know full NOHZ. (you may agree with the
> latter.) in any case, 1/2 patch is necessary which current code is
> absolutely missing.
>
> peter, what do you think about my opinion? and about my 1/2 patch?

I did not look too closely, but it might have the right shape for
dealing with !idle ticks. I'd have to look more closely at it.

> i will modify 2/2 patch depending on your feedback.

I think it will take more than a single patch to rework all of
update_process_times(). And we should also ask Thomas for his opinion,
but I think we want:

- make update_process_times() take a nr_ticks argument
- fixup everything below it

- fix tick_nohz_handler to not ignore the hrtimer_forward()
return value and pass it into
tick_sched_handle()/update_process_times().

(assuming this is the right oneshot tick part, tick-common
seems to be about periodic timers which aren't used much ?!)


Also, there is of course the question of what time means in virt, is
there time when you're not running etc.. But I'll leave that to people
who care about such things.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/