RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

From: Oza (Pawandeep) Oza
Date: Fri May 08 2015 - 00:16:26 EST


So Mike, is this reason strong enough for you ?

I understand your point: solve the BUG, and I do tend to agree with you.

But by design and implementation, the BUG() is just a beginning of the end for dying kernel.
And what happens in between this 'the beginning' and 'the end' is not less important.
(because say, on our platform we want to get clean RAMDUMP to analyze what happened, and for that we want to get clean reboot)

Also,
If somebody's design is to legally Crash the kernel (e.g. where kernel is actually not faulty).
Then, I do expect that tick/timekeeping framework do its job as long as it can do, and it should do, because kernel is not faulty.
But in this case it doesnât handover jiffies incrementing job sanely.

In other words,
"no one can relies on jiffies, or rather the code which is based on jiffies will never forward progress in this path"

Regards,
-Oza


-----Original Message-----
From: Oza (Pawandeep) Oza
Sent: Thursday, May 07, 2015 2:17 PM
To: 'Mike Galbraith'
Cc: pawandeep oza; linux-kernel@xxxxxxxxxxxxxxx; malayasen rout
Subject: RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

Oh ok.
So the reason why I cared was:

There is a code in our base which relies on jiffies, but since jiffies are not incrementing, the code waits there and loops forever.
And forward progress is on halt. (on cpu0, since that is the only cpu, which is alive)

We have changed the code to use mdelay and things move on.

But that means that in the patch which I mentioned,
any code which relies on jiffies will stuck forever and will not allow rest of the code to get executed and hence no forward progress.
specially if that code is running with preempt_disable();

Regards,
-Oza


-----Original Message-----
From: Mike Galbraith [mailto:umgwanakikbuti@xxxxxxxxx]
Sent: Thursday, May 07, 2015 2:00 PM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@xxxxxxxxxxxxxxx; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 07:05 +0000, Oza (Pawandeep) Oza wrote:
> : )
>
> Well, I am not sure, if problem was communicated clearly from my side.

I understood. I just don't understand why you'd care deeply whether
CPU0 halts or eternally waits. Both render it harmless and useless.

-Mike

N‹§²æ¸›yú²X¬¶ÇvØ–)Þ{.nlj·¥Š{±‘êX§¶›¡Ü}©ž²ÆzÚj:+v‰¨¾«‘êZ+€Êzf£¢·hšˆ§~†­†Ûÿû®w¥¢¸?™¨è&¢)ßf”ùy§m…á«a¶Úÿ 0¶ìå