RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

From: Oza (Pawandeep) Oza
Date: Thu May 07 2015 - 04:20:01 EST


Mike,

Here is the code which will explain you what I meant to address.
The is just a WARN_ON in case if "any other cpu, other than this cpu, are all offline, and at the same time tick_do_timer_cpu is not set correctly)

Note: this patch is just to put forward the problem. (not an actual patch)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9142591..3aa4c8c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -112,6 +112,7 @@ static ktime_t tick_init_jiffy_update(void)
static void tick_sched_do_timer(ktime_t now)
{
int cpu = smp_processor_id();
+ int other_cpu, is_cpu_online = 0;

#ifdef CONFIG_NO_HZ_COMMON
/*
@@ -125,6 +126,11 @@ static void tick_sched_do_timer(ktime_t now)
&& !tick_nohz_full_cpu(cpu))
tick_do_timer_cpu = cpu;
#endif
+ for (other_cpu = 0; other_cpu < nr_cpu_ids = 0; other_cpu++) {
+ if (other_cpu != cpu)
+ is_cpu_online += cpu_online(other_cpu);
+ }
+ WARN_ON((tick_do_timer_cpu != cpu) && !is_cpu_online)

/* Check, if the jiffies need an update */
if (tick_do_timer_cpu == cpu)

Regards,
-Oza


-----Original Message-----
From: Oza (Pawandeep) Oza
Sent: Thursday, May 07, 2015 12:36 PM
To: 'Mike Galbraith'
Cc: pawandeep oza; linux-kernel@xxxxxxxxxxxxxxx; malayasen rout
Subject: RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

: )

Well, I am not sure, if problem was communicated clearly from my side.
Let me attempt it again.

If variable tick_do_timer_cpu = 0. Things are fine.
If it is some other value say for e.g. 1, 2 or 3 then core0 does not increment jiffies. (but say if it is set to tick_do_timer_cpu=1, then core1 will increment jiffies)

If cpu1 ,2 and 3 are sent smp_send_stop and as a result of that cpu1, 2 and 3 will be stopped.

Now only cpu0 is alive, cpu0 should increment jiffies upon each time tick.
For that tick_do_timer_cpu should be set to 0.

Which is not happening.

Regards,
-Oza


-----Original Message-----
From: Mike Galbraith [mailto:umgwanakikbuti@xxxxxxxxx]
Sent: Thursday, May 07, 2015 12:25 PM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@xxxxxxxxxxxxxxx; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 05:58 +0000, Oza (Pawandeep) Oza wrote:
> Yes.
> But dying kernel doesnât mean it CAN NOT INCREMENT jiffies.
> do_timer should do the job until kernel takes its last breathe and more precisely CPU0 take its last breathe by halting itself as its last instruction.

Feel free to add a redundant timer subsystem lest we BUG() in there, and
whatever else you need to guarantee a perfect orderly death for your
box. I prefer live boxen, would make that BUG() go away.

-Mike