Re: [RFC][PATCH] new timeofday core subsystem (v.A0)

From: George Anzinger
Date: Wed Sep 15 2004 - 03:08:59 EST


Christoph Lameter wrote:
On Tue, 14 Sep 2004, George Anzinger wrote:


u64 time_source_to_ns(u64 x) {
return (((x-time_source_at_base) & time_source->mask)*time_source->multiply) >> time_source->shift;
}

This seems to assume that the time souce is incrementing. On some archs, I
think, it decrements...


This could be handled by a function that transforms the value read from
the counter into an incrementing value. I.e.

u64 get_rev_timerval(void) {
return 1<< 55 - readq(TIMER_PORT);
}


So we would do "time_adjust_skip(0);" to update time_source_at_base?


There is no reason to update time_source_at_base unless adjustments need
to be done or a danger exists of the counter wrapping around (16 bit
counter?)

Yes, for example the pm counter is 24 bits. A lot of platforms have 32 bit counters...


If we do a "good" job of choosing <multiply> and <shift> this will be a "very"
small change. Might be better to pass in a "delta" to change it by. Then you
would only need one function.


These are the raw routines. Higher level function could translate a delta
into the appropriate adjustments.


The mask and the shift value are not really related. The mask is a function of
the number of bits the hardware provides. The shift is related to the value of
freq. Me thinks they should not be tied together here.


They are related because the maximum shift for a 64 bit value without
loosing bits is 64 - number of significant bits. This basically insures
maximum precision when scaling the counter.

Lets assume the pm counter which has 24 bits. This means your shift is 40 bits. In "s->multiply = (NSEC_PER_SEC << s->shift) / freq;" you will have an overflow.
Here you need to keep (NSEC_PER_SEC << s->shift) in 64 bits AND multiply must also be 32 bits or less. I really don't think you can choose the scale so easily.



/* Values in use in the kernel and how they may be derived from xtime */
#define jiffies (now()/1000000)

This assumes HZ=1000. (Assuming there is an HZ any more, that is.) Not all
archs will want this value. Possibly:
#define jiffies ((now() * HZ) / 1000000000)


Right. I just thought of the standard case HZ=1000.


u64 get_cpu_time_filtered() {
u64 x;
u64 l;

This will need to be "static";


Nope. time_source_last is the global. l is just a copy of
time_source_last.

Right, I miss read the function. cycles() should be now() if I am reading this right.


Ok, so now lets hook this up with interval timers:

#define ns_per_jiffie (NSEC_PER_SEC / HZ)
#define jiffies_to_ns(jiff) (jiff * ns_per_jiffie)

This function is a request to interrupt at the next jiffie after the passed
reference jiffie. If that time is passed return true, else false.


One could do this but we want to have a tickless system. The tick is only
necessary if the time needs to be adjusted.

I really think a tickless system, for other than UML systems, is a loosing thing. The accounting overhead on context switch (which increases as the number of switchs per second) will cause more overhead than a periodic accounting tick once a respectable load appears. The periodic accounting tick has a flat overhead that does not depend on load.

But you are right there is the need for timer event scheduling that is
not included yet. This should be a method of the time source.

I am not sure that is the right thing to do here. For example, on SMP systems today we have a timer event interrupt per cpu. This is much more scaleable and not so easy to do if we tie it to the time source. All we need is a reasonably accurate short term counter.
--
George Anzinger george@xxxxxxxxxx
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/