Re: [RFC][PATCH] O(1) Entitlement Based Scheduler

From: Peter Williams
Date: Thu Feb 26 2004 - 18:34:07 EST

Next message: Chris Wright: "Re: shmget with SHM_HUGETLB flag: Operation not permitted"
Previous message: Andrew Morton: "Re: /proc visibility patch breaks GDB, etc."
In reply to: Albert Cahalan: "Re: [RFC][PATCH] O(1) Entitlement Based Scheduler"
Next in thread: Peter Williams: "Re: [RFC][PATCH] O(1) Entitlement Based Scheduler"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Albert Cahalan wrote:

On Thu, 2004-02-26 at 01:19, Peter Williams wrote:

Albert Cahalan wrote:

John Lee writes:

The usage rates for each task are estimated using Kalman
filter techniques, the estimates being similar to those
obtained by taking a running average over twice the filter
_response half life_ (see below). However, Kalman filter
values are cheaper to compute and don't require the
maintenance of historical usage data.

Linux dearly needs this. Please separate out this part
of the patch and send it in.

This information can be determined from the SleepAVG: field in the /proc/<pid>/status and /proc/<tgid>/task/<pid>/status files by subtracting the value there from 100.

This doesn't seem to be the case. For example, a fork()
causes the value to be adjusted in both child and parent.

This would be the case with our patch as well as we make children inherit their parent's usage rate to partially reduce the effect of ramp up in the estimation of the child's CPU usage rate.

Also, perhaps the name is wrong, but I'd think SleepAVG
has more to do with the average length of a sleep. It sure
isn't documented. (time constant? type of decay?)

My reading of the code caused me to interpret it as a percentage sleep rate i.e. a value of 50 means the task is sleeping 50% of the time. And this has made me realise that without our patches using (100 - SleepAVG) would not really give you CPU usage rate but would instead give RUNNABILITY rate (i.e. the proportion of time the task is spending on the cpu OR on a runqueue waiting for cpu access.

It also makes me realise that the SleepAVG our patch reports is NOT really a sleep rate it's a sleep OR waiting on a runqueue rate.

There's also a need for whole-process stats and cumulative
(sum of exited children) stats. %CPU can go as high as 51200%.

Without our patch this value is a directly calculated estimated of the task's sleep rate which is available because it used by the O(1) scheduler's heuristics. With our patches, it is calculated from our estimate of the task's usage because we dispensed with the sleep average calculations as they are no longer needed. We decided to still report sleep average in the status file because we were reluctant to alter the contents of such files in case we broke user space programs.

Generally this is a good move, though I don't expect anything
to be using SleepAVG at the moment.

OK.

Right now, Linux does not report the recent CPU usage
of a process. The UNIX standard requires that "ps"
report this; right now ps substitutes CPU usage over
the whole lifetime of a process.

Both per-task and per-process (tid and tgid) numbers
are needed. Both percent and permill (1/1000) units
get reported, so don't convert to integer percent.

I think a modification to fs/proc/array.c to make this field a per million rather than a percent value would satisfy your needs. It would be a very small change but there would be concerns about breaking programs that rely on it being a percentage.

Nothing can rely on it existing at all, so a name change would
solve the problem of apps getting confused.

BTW, permill is not per-million, it is per-thousand.
Per-million or per-billion would be fine as long as
it doesn't overflow.

OK. Since SleepAVG does not seem to be entrenched in people's expectations and because of the fact that the value calculated from our usage rates are not really valid (see above), I propose that we change this field's name to CPUrate and report the CPU usage rate directly in permill (unless there are violent objections).

Peter
--
Dr Peter Williams, Chief Scientist peterw@xxxxxxxxxx
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Chris Wright: "Re: shmget with SHM_HUGETLB flag: Operation not permitted"
Previous message: Andrew Morton: "Re: /proc visibility patch breaks GDB, etc."
In reply to: Albert Cahalan: "Re: [RFC][PATCH] O(1) Entitlement Based Scheduler"
Next in thread: Peter Williams: "Re: [RFC][PATCH] O(1) Entitlement Based Scheduler"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]