Re: [watchdog] combine nmi_watchdog and softlockup

From: Cyrill Gorcunov
Date: Fri Apr 09 2010 - 10:57:07 EST


On Fri, Apr 09, 2010 at 02:00:38AM +0200, Frederic Weisbecker wrote:
> On Tue, Apr 06, 2010 at 07:31:15PM +0400, Cyrill Gorcunov wrote:
> > > I fear the cpu clock is not going to help you detecting any hard lockups.
> > > If you're stuck in an interrupt or an irq disabled loop, your cpu clock is
> > > not going to fire.
> > >
> >
> > I guess it's not supposed to. For such cases only nmi irqs may help for which
> > the perf events are there (/me need to check if we program apic timer for anything
> > like that). But it should help for other deadlocks. Or I miss something?
>
>
> Actually not. What the hardlockup detector does it to check the progression
> of irqs.
>

yup, i know what nmi-watchdog is doing. I guess you've misunderstood me. I meant
that sw-driven detector is not supposed to guard against the cases you're
referring to. I don't remember the details but someone proposed to make a
fallback to sw-watchdog if there is no ability to use nmi from perf-events
(for any reason) which eventually being implemented in Don's patch. And
there will be a message that watchdog has been switched to sw-driven
scaffold. So user will (or should) see this message and mark it I believe.
This sw-watchdog is like "ok, we've been trying our best but there is a
problem and the only solution we could offer -- is to use sw-watchdog".
That is how I understand the reason for sw-watchdog there.

>
> So it detects true hardlockups: stuck in an irq disabled section.
> If you don't have NMI to detect that (here this made by hardware clock based
> on cpu cycles overflows), then you're screwed. The hardlockup detector is
> useless with a maskable irq based clock.
>
-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/