Re: [ANNOUNCE] ktimers subsystem

From: Roman Zippel
Date: Wed Sep 21 2005 - 14:51:47 EST

Next message: Hugh Dickins: "Re: [PATCH 2.6.14-rc2] fix incorrect mm->hiwater_vm and mm->hiwater_rss"
Previous message: Andrew Morton: "Re: [PATCH 07/10] uml: avoid fixing faults while atomic"
In reply to: George Anzinger: "Re: [ANNOUNCE] ktimers subsystem"
Next in thread: Thomas Gleixner: "Re: [ANNOUNCE] ktimers subsystem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

On Mon, 19 Sep 2005 tglx@xxxxxxxxxxxxx wrote:

First a quick question:

> This revealed a reasonable explanation for this behaviour. Both
> networking and disk I/O arm a lot of timeout timers (the maximum number
> of armed timers during the tests observed was ~400000).

This triggers the obvious question: where are these timers coming from?
You don't think that having that much timers in first place is little
insane (especially if these are kernel timers)?

Ok, now to the long part:

> The conclusions of my recent work on Linux time(rs) related problems and
> the analysis of related patches are:
>
> 1. The HZ/jiffy based usage of time in the kernel code has to be
> converted to human time units.
>
> 2. A clean seperation of all related APIs and subsystems is necessary
> even if they have interdependencies and shared functionality

How you get to these conclusions is still rather unclear, I don't even
really know what the problem is from just reading the pretext. You talk
about previous efforts, you talk about users abusing the timer system, but
what is actually the problem of the timer system itself?

Later it becomes clear that you want high resolution timers, what doesn't
become clear is why it's such an essential feature that everyone has to
pay the price for it (this does not only include changes in runtime
behaviour, but also the proposed API changes).

What is seriously missing here is the big picture. First off how does it
currently look like? You rather shortly mention scheduler ticks and your
analysis basically says only that it's "a bunch of ugliness". There is no
mention why they are needed in first place and there is no real
explanation why they are such a big problem for high resolution timers.

Second, an API cleanup is all nice, but the more interesting part is still
what is behind this API and this part you pretty much leave in the dark.
Basically how does the new big picture look like and how do high
resolution timer fit into it? (You are more busy defending the 64bit math,
than actually explaining why and where it's needed in the first place.)

Sorry, if this sounds harsh, but your announcement is more a random
collection of information about timers than an explanation of why ktimers
are desirable. I'm not against high resolution timers per se, but this
doesn't explain why it has to be high resolution all the way. It also
doesn't explain how it will interact with Johns work, e.g. I'm only scared
if I see this in the ktimer_hres patch:

+extern int arch_hrtimer_init(int highres);
+extern int arch_hrtimer_reprogram(nsec_t expires);
+extern void arch_hrtimer_trigger_ints(void);

Ok, so what's missing? From a basic design overview I would expect some
information about types of time within the kernel and their relationship.
We basically have three types:
- scheduler time
- wallclock time
- process time

The scheduler time (aka jiffies) is not just used for timeouts, it's the
basic time unit to schedule cpu time. It's major requirement is simplicity
- a 32bit value can always be read without locking and calculations based
on it are simple.
I exclude posix clocks here as it can be used with both wallclock and
process time. The main difference between them is that the latter is user
programmable. Here we get to the core problem of timer ticks: the current
timer system is designed around a simple timer model, which is not
reprogrammable, so the timer resolution available to user space is
limited to the timer tick resolution.

Johns patches now introduce two major new concepts as a generic mechanism
(and not just hidden somewhere in arch code): 1) a timer source
abstraction, 2) making wallclock updates independent of the timer tick.
BTW here you completely miss the "main point of criticizm", the 64bit math
is a problem, but the main problem is that he completely changed the NTP
kernel model. I don't deny that the NTP code could use some updates
itself, but that's a completely separate problem. Regarding the timer
system it's only important how to synchronize NTP time with the kernel
wallclock time, as soon as you get that right, the whole 64bit math
problematic becomes irrelevant.

The existence of the timer source abstraction is a major requirement for
further improvements (in this regard it's already suspicious, that you put
major changes before Johns patch). The next major change would be to add
the possibility to reprogram a timer source, the scheduler can use this to
skip timer ticks and e.g. itimer can offer higher resolution timers. The
main point here is before we get to any API decisions, we need to develop
a model how a single time source can drive multiple users. Your split
between user timers and kernel timeouts leaves this question completely
open.

The next step (_after_ reprogrammable timer sources) would be increasing
the timer resolution. Here I'm not at all convinced, that we need to
change everything to nanosecond resolution, we can easily make this a
config option which either ties process time resolution to scheduler time
or makes it independent. The first would make process time a 32bit ms
value (basically current behaviour), the latter can make it to a 64bit ns
value. Anyone trying to introduce nsec_t in common code really needs to
come up with some better arguments why calculations in ns are necessary
unconditionally, instead of making the resolution configurable.

In summary please provide a larger picture for your changes, it's
especially important to desribe the relationship between the various
systems. The API definition is only the last step and is derived from
these relationships.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Hugh Dickins: "Re: [PATCH 2.6.14-rc2] fix incorrect mm->hiwater_vm and mm->hiwater_rss"
Previous message: Andrew Morton: "Re: [PATCH 07/10] uml: avoid fixing faults while atomic"
In reply to: George Anzinger: "Re: [ANNOUNCE] ktimers subsystem"
Next in thread: Thomas Gleixner: "Re: [ANNOUNCE] ktimers subsystem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]