Re: RCU CPU stall console spews leads to soft lockup disabled is reasonable ?

From: Don Zickus
Date: Tue Jan 20 2015 - 10:25:53 EST


On Tue, Jan 20, 2015 at 11:09:19AM +0800, Zhang Zhen wrote:
>
> > Of course back then, touch_nmi_watchdog touched all cpus. So a problem
> > like this was masked. I believe this upstream commit 62572e29bc53, solved
> > the problem.
>
> Thanks for your suggestion.
>
> Commit 62572e29bc53 changed the semantics of touch_nmi_watchdog and make it
> only touch local cpu not every one.
> But watchdog_nmi_touch = true only guarantee no hard lockup check on this cpu.
>
> Commit 62572e29bc53 didn't changed the semantics of touch_softlockup_watchdog.

Ah, yes. I reviewed the commit to quickly yesterday. I thought
touch_softlockup_watchdog was called on every cpu and that commit changed
it to the local cpu. But that was incorrect.

> >
> > You can apply that commit and see if you if you get both RCU stall
> > messages _and_ softlockup messages. I believe that is what you were
> > expecting, correct?
> >
> Correct, i expect i can get both RCU stall messages _and_ softlockup messages.
> I applied that commit, and i only got RCU stall messages.

Hmm, I believe the act of printing to the console calls touch_nmi_watchdog
which calls touch_softlockup_watchdog. I think that is the problem there.

This may not cause other problems but what happens if you comment out the
'touch_softlockup_watchdog' from the touch_nmi_watchdog function like
below (based on latest upstream cb59670870)?

The idea is that console printing for that cpu won't reset the softlockup
detector. Again other bad things might happen and this patch may not be a
good final solution, but it can help give me a clue about what is going
on.

Cheers,
Don

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 70bf118..833c015 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -209,7 +209,7 @@ void touch_nmi_watchdog(void)
* going off.
*/
raw_cpu_write(watchdog_nmi_touch, true);
- touch_softlockup_watchdog();
+ //touch_softlockup_watchdog();
}
EXPORT_SYMBOL(touch_nmi_watchdog);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/