Re: Stolen and degraded time and schedulers

From: Dan Hecht
Date: Thu Mar 15 2007 - 15:10:19 EST


On 03/14/2007 02:18 PM, Jeremy Fitzhardinge wrote:
Dan Hecht wrote:
On 03/14/2007 01:31 PM, Jeremy Fitzhardinge wrote:
Dan Hecht wrote:
Sounds good. I don't see this in your patchset you sent yesterday
though; did you add it after sending out those patches?
Yes.

if so, could you forward the new patch? does it explicitly prevent
stolen time from getting accounted as user/system time or does it
just rely on NO_HZ mode sort of happening to work that way (since the
one shot timer is skipped ahead for missed ticks)?
Hm, not sure. It doesn't care how often it gets called; it just
accumulates results up to that point, but I'm not sure if the time would
get double accounted. Perhaps it doesn't matter when using
xen_sched_clock().

I think you might be double counting time in some cases. sched_clock() isn't really relevant to stolen time accounting (i.e.
cpustat->steal).

I think what you want is to make sure that the sum of the cputime
passed to all of:

account_user_time
account_system_time
account_steal_time

adds up to the total amount of time that has passed. I think it is
sort of working for you (i.e. doesn't always double count stolen
ticks) since in NO_HZ mode, update_process_time (which calls
account_user_time & account_system_time) happens to be skipped during
periods of stolen time due to the hrtimer_forward()'ing of the one
shot expiry.

OK, this will need a closer look.

BTW, what are the properties of the vmi read_available_cycles()? Is
that a per-cpu timer? If its used as the timebase for sched_clock, how
does recalc_task_prio not get a -ve sleep time?


Available time is defined to be (real_time - stolen_time). i.e. time in which the vcpu is either running or not ready to run [because it is halted, and nothing is pending]).

So, yes, it is per-vcpu. But, the sched_clock() samples are rebased when processes are migrated between runqueues; search sched.c for most_recent_timestamp. It's not perfect since most_recent_timestamp between cpu0 and cpu1 do not correspond to the exact same instant, but does prevent negative sleep time and is fairly close.

Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/