Re: [Issue] timer callback registered with mod_timer is getting called beforetime

From: Frederic Weisbecker
Date: Fri Sep 24 2021 - 06:38:16 EST


Hi Maninder,

On Fri, Sep 24, 2021 at 12:23:10PM +0530, Maninder Singh wrote:
> Hi,
>
>
> we are facing issue of early invocation of timer callback(kernel versio 5.4), Is it known issue for timer callback?
>
> As we checekd code if CPU responsible for updating jiffies value(lets say CPU0) has interrupts disabled for 60 ms
> and in mean time other CPU(CPU1) registers new timer for (jiffies + x) value. its callback will be called before time.
>
> As it reads old jiffies value and register callback for x jiffies afterwards, but instead it is called
> at x - y jiffies. (y is lapsed period which CPU 1 does not know from jiffies)
>
>
> CPU 0 CPU 1
>
> // last jiffies updated
> local_irq_disable(); do_some_work()
> do_work() ....
> ... ....
> ... ....
> ... // 60 ms mod_timer(&timer, jiffies + x); // read old jiffies value
> local_irq_enable() ...
> // new jiffies updated with // timer interrupt reads updated jiffies and calls callback function where x was not actually passed on HW clock.
> // jiffies + 16
>
>
> we tried with TC (with 500 ms irq disable just to check behaviour, in actual issue it was 60 ms IRQ disable)
>
> // IRQs disabled on same CPU0 responsible for jiffies updation for 500 mili seconds
>
> [23.5598] ##### disable IRQs
> [24.0600] ##### registered timer 25025334960 5887 //registered timer at HW clock 25.025 seconds for 600 ms at 5887 jiffies on CPU1
>
> [24.0604] ##### ticks change abruptly 5887 124 //IQRS enabled, CPU0 increments 124 jiffies colllectively because of irq disable long time
>
> [24.1595] ##### timer called 25124783406 6037 // as per jiffies callback trigger at 6037 jiffies i.e. after 150 jiffies(600ms)
> // but HW clock is passed only 100 milliseconds.
>
>
>
> Is it known behaviour for timers?
> because only 1 CPU is assigned to update jiffies work to call do_timer utill unless it goes to idle state and pass ownership to other CPU.
>
> we tried by making all CPU to handle code for jiffies updation (it will add performance hit)
> but then no issue of abrupt jiffies change occured on system.

First of all, are you meeting this issue specifically on NOHZ_FULL? Because
there is a pending fix for a related matter there:

https://lore.kernel.org/lkml/20210915142303.24297-1-frederic@xxxxxxxxxx/

As for what you're reporting here, I think the core problem is the fact that the
timekeeper (jiffies updater) is stuck with IRQs disabled for way too long. Even
one millisecond is too much to tolerate. Do you have any idea about the source of
that situation?

Thanks.