Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full

From: Paul E. McKenney
Date: Wed Mar 20 2024 - 07:17:12 EST


On Tue, Mar 19, 2024 at 02:18:00AM -0700, Paul E. McKenney wrote:
> On Tue, Mar 19, 2024 at 12:07:29AM +0100, Frederic Weisbecker wrote:
> > While running in nohz_full mode, a task may enqueue a timer while the
> > tick is stopped. However the only places where the timer wheel,
> > alongside the timer migration machinery's decision, may reprogram the
> > next event accordingly with that new timer's expiry are the idle loop or
> > any IRQ tail.
> >
> > However neither the idle task nor an interrupt may run on the CPU if it
> > resumes busy work in userspace for a long while in full dynticks mode.
> >
> > To solve this, the timer enqueue path raises a self-IPI that will
> > re-evaluate the timer wheel on its IRQ tail. This asynchronous solution
> > avoids potential locking inversion.
> >
> > This is supposed to happen both for local and global timers but commit:
> >
> > b2cf7507e186 ("timers: Always queue timers on the local CPU")
> >
> > broke the global timers case with removing the ->is_idle field handling
> > for the global base. As a result, global timers enqueue may go unnoticed
> > in nohz_full.
> >
> > Fix this with restoring the idle tracking of the global timer's base,
> > allowing self-IPIs again on enqueue time.
>
> Testing with the previous patch (1/2 in this series) reduced the number of
> problems by about an order of magnitude, down to two sched_tick_remote()
> instances and one enqueue_hrtimer() instance, very good!
>
> I have kicked off a test including this patch. Here is hoping! ;-)

And 22*100 hours of TREE07 got me one run with a sched_tick_remote()
complaint and another run with a starved RCU grace-period kthread.
So this is definitely getting more reliable, but still a little ways
to go.

Thanx, Paul

> > Reported-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > Fixes: b2cf7507e186 ("timers: Always queue timers on the local CPU")
> > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > ---
> > kernel/time/timer.c | 12 +++++++++++-
> > 1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > index e69e75d3858c..dee29f1f5b75 100644
> > --- a/kernel/time/timer.c
> > +++ b/kernel/time/timer.c
> > @@ -642,7 +642,8 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
> > * the base lock:
> > */
> > if (base->is_idle) {
> > - WARN_ON_ONCE(!(timer->flags & TIMER_PINNED));
> > + WARN_ON_ONCE(!(timer->flags & TIMER_PINNED ||
> > + tick_nohz_full_cpu(base->cpu)));
> > wake_up_nohz_cpu(base->cpu);
> > }
> > }
> > @@ -2292,6 +2293,13 @@ static inline u64 __get_next_timer_interrupt(unsigned long basej, u64 basem,
> > */
> > if (!base_local->is_idle && time_after(nextevt, basej + 1)) {
> > base_local->is_idle = true;
> > + /*
> > + * Global timers queued locally while running in a task
> > + * in nohz_full mode need a self-IPI to kick reprogramming
> > + * in IRQ tail.
> > + */
> > + if (tick_nohz_full_cpu(base_local->cpu))
> > + base_global->is_idle = true;
> > trace_timer_base_idle(true, base_local->cpu);
> > }
> > *idle = base_local->is_idle;
> > @@ -2364,6 +2372,8 @@ void timer_clear_idle(void)
> > * path. Required for BASE_LOCAL only.
> > */
> > __this_cpu_write(timer_bases[BASE_LOCAL].is_idle, false);
> > + if (tick_nohz_full_cpu(smp_processor_id()))
> > + __this_cpu_write(timer_bases[BASE_GLOBAL].is_idle, false);
> > trace_timer_base_idle(false, smp_processor_id());
> >
> > /* Activate without holding the timer_base->lock */
> > --
> > 2.44.0
> >