Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

From: Frederic Weisbecker
Date: Fri Dec 29 2017 - 11:12:22 EST


On Wed, Dec 27, 2017 at 09:58:08PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> > Bah, no. We need to move that into the nohz logic somehow to prevent that
> > repetitive expiry yesterday reprogramming. Lemme think about it some more.
>
> The patch below should be the proper cure.
>
> Thanks,
>
> tglx
>
> 8<-------------------
> Subject: nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Date: Fri, 22 Dec 2017 15:51:13 +0100
>
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
> subsequently invokes tick_nohz_stop_sched_tick() are:
>
> if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
>
> If need_resched() is not set, but a timer softirq is pending then this is
> an indication that the softirq code punted and delegated the execution to
> softirqd. need_resched() is not true because the current interrupted task
> takes precedence over softirqd.
>
> Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
> timer interrupts because the timer wheel contains an expired timer, but
> softirqs are not yet executed. So it returns an immediate expiry request,
> which causes the timer to fire immediately again. Lather, rinse and
> repeat....
>
> Prevent that by adding a check for a pending timer soft interrupt to the
> conditions in tick_nohz_stop_sched_tick() which avoid calling
> get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
> prevents a repetitive programming of an already expired timer.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Cc: Sebastian Siewior <bigeasy@xxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Paul McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Cc: Anna-Maria Gleixner <anna-maria@xxxxxxxxxxxxx>
>
> ---
> kernel/time/tick-sched.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -650,6 +650,11 @@ static void tick_nohz_restart(struct tic
> ts->next_tick = 0;
> }
>
> +static inline bool local_timer_softirq_pending(void)
> +{
> + return local_softirq_pending & TIMER_SOFTIRQ;
> +}
> +
> static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> ktime_t now, int cpu)
> {
> @@ -666,8 +671,8 @@ static ktime_t tick_nohz_stop_sched_tick
> } while (read_seqretry(&jiffies_lock, seq));
> ts->last_jiffies = basejiff;
>
> - if (rcu_needs_cpu(basemono, &next_rcu) ||
> - arch_needs_cpu() || irq_work_needs_cpu()) {
> + if (rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() ||
> + irq_work_needs_cpu() || local_timer_softirq_pending()) {

Much better. This may need a comment though because it's not immediately
obvious why we have this check while softirqs are processed just before
tick_irq_exit().

Thanks.

Acked-by: Frederic Weisbecker <frederic@xxxxxxxxxx>