Re: Stolen and degraded time and schedulers

From: Daniel Walker
Date: Wed Mar 14 2007 - 15:00:36 EST


On Wed, 2007-03-14 at 11:41 -0700, Jeremy Fitzhardinge wrote:
> Daniel Walker wrote:
> > >From prior emails I think your suggesting that 1ms (or 5 or 10) of time
> > should actually be a variable X that is changed inside sched_clock().
> > That's not the purpose of that API call, sched_clock() measure real time
> > period.
> >
>
> To what purpose? What is it really measuring? My understanding is that
> its for the scheduler to work out how much time a process actually ran
> for. Aside from its use in printk as a general monotonic timestamp,
> this seems to be how it gets used everywhere. If I change it to return
> cpu-ns (ie, make it not count time that the cpu was stolen by the
> hypervisor), then it will return what its callers actually want to know.

sched_clock is used to bank real time against some specific states
inside the scheduler, and no it doesn't _just_ measure a processes
executing time.

> > After reading your emails it sounds like what you really want is similar
> > to accurate state accounting which is used for scheduling purposes. Part
> > of that has already been implemented at least twice that I know of.
> > Accounting real time against specific states was done in two version of
> > microstate accounting. Those are fine starting points for the changes
> > you are wanting.
>
> I haven't looked at the microstate accounting patches in any detail, but
> I'm assuming that they take a timestamp at each CPU state transition and
> use that to account time to the appropriate entities (tell me if I'm
> missing something pertinent here). There are two problems with that
> approach in this case:
>
> 1. If the cpu is stolen by the hypervisor, the kernel will get no
> state transition notification. It can generally find out that
> some time was stolen after the fact, but there's no specific event
> at the time it happens.

The hypervisor would need to do it's own accounting I'd imagine then
provide that to the scheduler.

> 2. It doesn't map particularly well to a cpu changing speed. In
> particular if a cpu has continuously varying execution speed
> (Transmeta?), then the best you can hope for is the integration of
> cpu work done over a time period rather than discrete cpu
> speed-change events.

True, but as I said in my original email it's not trivial to follow
physical cpu speed changes since the changes are free form and change
potentially per system. Your better off do it just with the hypervisor
since you can control it ..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/