Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full

From: Paul E. McKenney
Date: Tue Mar 19 2024 - 05:18:10 EST


On Tue, Mar 19, 2024 at 12:07:29AM +0100, Frederic Weisbecker wrote:
> While running in nohz_full mode, a task may enqueue a timer while the
> tick is stopped. However the only places where the timer wheel,
> alongside the timer migration machinery's decision, may reprogram the
> next event accordingly with that new timer's expiry are the idle loop or
> any IRQ tail.
>
> However neither the idle task nor an interrupt may run on the CPU if it
> resumes busy work in userspace for a long while in full dynticks mode.
>
> To solve this, the timer enqueue path raises a self-IPI that will
> re-evaluate the timer wheel on its IRQ tail. This asynchronous solution
> avoids potential locking inversion.
>
> This is supposed to happen both for local and global timers but commit:
>
> b2cf7507e186 ("timers: Always queue timers on the local CPU")
>
> broke the global timers case with removing the ->is_idle field handling
> for the global base. As a result, global timers enqueue may go unnoticed
> in nohz_full.
>
> Fix this with restoring the idle tracking of the global timer's base,
> allowing self-IPIs again on enqueue time.

Testing with the previous patch (1/2 in this series) reduced the number of
problems by about an order of magnitude, down to two sched_tick_remote()
instances and one enqueue_hrtimer() instance, very good!

I have kicked off a test including this patch. Here is hoping! ;-)

Thanx, Paul

> Reported-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Fixes: b2cf7507e186 ("timers: Always queue timers on the local CPU")
> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> ---
> kernel/time/timer.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index e69e75d3858c..dee29f1f5b75 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -642,7 +642,8 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
> * the base lock:
> */
> if (base->is_idle) {
> - WARN_ON_ONCE(!(timer->flags & TIMER_PINNED));
> + WARN_ON_ONCE(!(timer->flags & TIMER_PINNED ||
> + tick_nohz_full_cpu(base->cpu)));
> wake_up_nohz_cpu(base->cpu);
> }
> }
> @@ -2292,6 +2293,13 @@ static inline u64 __get_next_timer_interrupt(unsigned long basej, u64 basem,
> */
> if (!base_local->is_idle && time_after(nextevt, basej + 1)) {
> base_local->is_idle = true;
> + /*
> + * Global timers queued locally while running in a task
> + * in nohz_full mode need a self-IPI to kick reprogramming
> + * in IRQ tail.
> + */
> + if (tick_nohz_full_cpu(base_local->cpu))
> + base_global->is_idle = true;
> trace_timer_base_idle(true, base_local->cpu);
> }
> *idle = base_local->is_idle;
> @@ -2364,6 +2372,8 @@ void timer_clear_idle(void)
> * path. Required for BASE_LOCAL only.
> */
> __this_cpu_write(timer_bases[BASE_LOCAL].is_idle, false);
> + if (tick_nohz_full_cpu(smp_processor_id()))
> + __this_cpu_write(timer_bases[BASE_GLOBAL].is_idle, false);
> trace_timer_base_idle(false, smp_processor_id());
>
> /* Activate without holding the timer_base->lock */
> --
> 2.44.0
>