RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

From: Oza (Pawandeep) Oza
Date: Thu May 07 2015 - 00:38:22 EST


Hi Mike,

Let me explain the problem again.

Problem Statement: the timkeeping is stopped, do_timer is no more a job of cpu0.

The reason: the variable "tick_do_timer_cpu" is not set to correct CPU (cpu0)
And when BUG() happens, the tick_do_timer_cpu variable stay set to 1, 2 or 3 (we have 4 cores)
And finally any code running on core0 (which relies on jiffies incrementing) doesnât work because there is nobody to increment jiffies.

There is tick_handover_do_timer, and if that is called then things are fine, but that is also not getting called because it is tightly coupled with hotplug.
since cpu_down is not getting called, this handover is not happening. and the last status of the variable tick_do_timer_cpu is always
pointing to DEAD cpu (1,2 or 3). and core0 waits forever (where if the code relies on the increment of jiffies).

Regards,
-Oza

-----Original Message-----
From: Mike Galbraith [mailto:umgwanakikbuti@xxxxxxxxx]
Sent: Thursday, May 07, 2015 8:53 AM
To: pawandeep oza
Cc: linux-kernel@xxxxxxxxxxxxxxx; malayasen rout; Oza (Pawandeep) Oza
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Wed, 2015-05-06 at 22:57 +0530, pawandeep oza wrote:

> but when say core0 has raised BUG..
...

> what is the right way to approach this problem

Look at the spot BUG() printed? BUG() means "Way to go slick, the code
you fed me (file:line) is toxic. Have a nice day, your ex-buddy core0".

-Mike