Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

From: Aaron Durbin
Date: Thu Aug 09 2007 - 17:16:58 EST


On 8/9/07, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, 8 Aug 2007 16:17:19 -0700
> Aaron Durbin <adurbin@xxxxxxxxxx> wrote:
>
> > In setup_APIC_timer with the HPET in use, a condition can arise while
> > waiting for the next irq slice to expire on the HPET which will either
> > cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
> > watchdog is disabled.
> >
> > The HPET comparator and the counter keep incrementing during its normal
> > operation. When a comparison event fires the comparator will increment
> > by the designated period. If the HPET trigger occurs right after
> > the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
> > for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
> > counter to wrap around. However, when the NMI watchdog is enabled the
> > NMI watchdog will detect a lockup and reboot the machine. This
> > scenario can be exasperated by the presence of an SMI which will
> > increase the window of opportunity for the condition to occur.
> >
> > The fix is to wait for the compartor to change which signals the
> > end of the tick slice.
> >
> > ---
> >
> > The last patch had a typo in the diff which really would cause the
> > problem state above. Sorry.
> >
> > diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
> > index 900ff38..06797e2 100644
> > --- a/arch/x86_64/kernel/apic.c
> > +++ b/arch/x86_64/kernel/apic.c
> > @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in
> >
> > /* wait for irq slice */
> > if (hpet_address && hpet_use_timer) {
> > + /*
> > + * Wait for the comparator value to change which signals that
> > + * the tick slice has expired.
> > + */
> > int trigger = hpet_readl(HPET_T0_CMP);
> > - while (hpet_readl(HPET_COUNTER) >= trigger)
> > - /* do nothing */ ;
> > - while (hpet_readl(HPET_COUNTER) < trigger)
> > + while (trigger == hpet_readl(HPET_T0_CMP))
> > /* do nothing */ ;
> > } else {
> > int c1, c2;
>
> but, but. Didn't this get fixed by
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch
> ?
>

That original patch looks good to me as well. I just checked git head to see if
this was already fixed upstream, and it didn't appear to be. Sorry for the
duplicate patch.

-Aaron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/