Re: [PATCH] hangcheck-timer is broken on x86

From: john stultz
Date: Mon Mar 29 2010 - 12:43:51 EST


On Mon, 2010-03-29 at 10:11 -0400, Yury Polyanskiy wrote:
> On Sun, 28 Mar 2010 18:00:36 -0700
> john stultz <johnstul@xxxxxxxxxx> wrote:
>
> > > 1) Does getrawmonotonic() satisfy hangcheck-timer? What I mean is, will
> > > it always return the wallclock nanoseconds even in the face of CPU speed
> > > changes, suspend, udelay, or any other suspension of kernel operation?
> > > Yes, I know this is a tougher standard than rdtsc(), but this is what
> > > hangcheck-timer wants. rdtsc() at least satisfied udelay and PCI hangs.
> >
> > getrawmonotonic() can be stalled and will wrap on some hardware (acpi pm
> > timer wraps every 5 seconds).
> >
>
> I am not sure which archs do you mean. But in any case,
> getrawmonotonic() is not just a wrap around a call to rdtsc() (or acpi
> pm timer read). It is based on the clock->raw_time, which is updated
> every timer interrupt by the update_wall_time(). So even if underlying
> timer wraps, it doesn't lead to getrawmonotonic() returning 0 sec.

What I'm saying is that if you're using getrawmonotonic() to detect
hangs, you might miss them, as getrawmonotonic may wrap (and thus stop
continually increasing) if the timer interrupt is delayed. This does not
apply to systems using the TSC clocksource, but does apply to systems
using the acpi_pm.

read_persistent_clock() is likely a better interface to use, because it
returns the seconds usually from the CMOS clock which runs continually
without any OS maintenance.

The only complication there would likely be hwclock syncing (either by
hand or via NTP), but that could be handled by a
touch_hangcheck_watchdog() style notifier.

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/