Re: [RFC] Dynamic Tick and Deferrable Timer Support

From: john stultz
Date: Fri Jan 30 2009 - 15:30:20 EST

On Fri, 2009-01-30 at 13:04 -0600, Jon Hunter wrote:
> john stultz wrote:
> > As an aside, there are some further hardware limitations in the
> > timekeeping core that limit the amount of time the hardware can sleep.
> > For instance, the acpi_pm clocksource wraps every 2.5 seconds or so,
> > so we have to wake up periodically to sample it to avoid wrapping
> > issues.
> >
> > Just to be able to deal with all the different hardware out there, the
> > timekeeping core expects to wake up twice a second to do this
> > sampling. It may be possible to push this out if you are using other
> > clocksources (HPET/TSC), but if sleeps for longer then a second are a
> > needed feature, we probably will need some infrastructure in the
> > timekeeping core that can be queried to make sure its safe.
> The variable "max_delta_ns" is used by the dynamic tick to govern the
> maximum time a given device can sleep. Hence, this variable should be
> configured as necessary for the device you are using. Therefore, if your
> device has a timer that will wrap every 2.5 seconds, then for this
> device the "max_delta_ns" should be configure so that it does not exceed
> this time.
> So what I was proposing is that for devices that have timers that would
> allow you to sleep beyond ~2.15 seconds (current max imposed by the
> clockevent_delta2ns function), why not increase the dynamic range (make
> this a 64-bit variable) or base (ie. from nanoseconds to milliseconds)
> to permit longer sleep times for devices that can support them? This
> should not have any negative impact on devices that cannot support such
> long sleep times.

No objection to max_delta_ns being increased, but whatever code manages
it will probably need to query the timekeeping core in some fashion to
make sure the timer hardware max isn't larger then the clocksource
hardware max. I've provided a rough sketch at what the timekeeping code
would probably look like below.

> So far I have not encountered any issues with doing this. Let me know if
> this does or does not address your concerns.

There may be some other issues here, such as NTP over-correction issues
(for instance: ntp trying to correct for a 1us offset over the next
second, but ends up applying it for 10 seconds) if we defer for a really
long time. But at that point, we might as well suspend to ram, like the
OLPC does.


diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 900f1b6..2cf9ebd 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -265,6 +265,21 @@ int timekeeping_valid_for_hres(void)
return ret;

+u64 timekeeping_max_deferment(void)
+ u64 max_nsecs
+ do {
+ seq = read_seqbegin(&xtime_lock);
+ max_nsecs = cyc2ns(clock, clock->mask);
+ } while (read_seqretry(&xtime_lock, seq));
+ return max_nsecs; /* XXX maybe reduce by some amount to be safe? */
* read_persistent_clock - Return time in seconds from the persistent clock.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at