Re: [RFC][PATCH] O(1) Entitlement Based Scheduler

From: Albert Cahalan
Date: Thu Feb 26 2004 - 15:28:23 EST


On Thu, 2004-02-26 at 01:19, Peter Williams wrote:
> Albert Cahalan wrote:
>> John Lee writes:

>>> The usage rates for each task are estimated using Kalman
>>> filter techniques, the estimates being similar to those
>>> obtained by taking a running average over twice the filter
>>> _response half life_ (see below). However, Kalman filter
>>> values are cheaper to compute and don't require the
>>> maintenance of historical usage data.
>>
>>
>> Linux dearly needs this. Please separate out this part
>> of the patch and send it in.
>
> This information can be determined from the SleepAVG: field in the
> /proc/<pid>/status and /proc/<tgid>/task/<pid>/status files by
> subtracting the value there from 100.

This doesn't seem to be the case. For example, a fork()
causes the value to be adjusted in both child and parent.

Also, perhaps the name is wrong, but I'd think SleepAVG
has more to do with the average length of a sleep. It sure
isn't documented. (time constant? type of decay?)

There's also a need for whole-process stats and cumulative
(sum of exited children) stats. %CPU can go as high as 51200%.

> Without our patch this value is a
> directly calculated estimated of the task's sleep rate which is
> available because it used by the O(1) scheduler's heuristics. With our
> patches, it is calculated from our estimate of the task's usage because
> we dispensed with the sleep average calculations as they are no longer
> needed. We decided to still report sleep average in the status file
> because we were reluctant to alter the contents of such files in case we
> broke user space programs.

Generally this is a good move, though I don't expect anything
to be using SleepAVG at the moment.

>> Right now, Linux does not report the recent CPU usage
>> of a process. The UNIX standard requires that "ps"
>> report this; right now ps substitutes CPU usage over
>> the whole lifetime of a process.
>>
>> Both per-task and per-process (tid and tgid) numbers
>> are needed. Both percent and permill (1/1000) units
>> get reported, so don't convert to integer percent.
>
> I think a modification to fs/proc/array.c to make this field a per
> million rather than a percent value would satisfy your needs. It would
> be a very small change but there would be concerns about breaking
> programs that rely on it being a percentage.

Nothing can rely on it existing at all, so a name change would
solve the problem of apps getting confused.

BTW, permill is not per-million, it is per-thousand.
Per-million or per-billion would be fine as long as
it doesn't overflow.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/