Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

From: Dan Hecht
Date: Tue Mar 06 2007 - 20:45:44 EST


On 03/06/2007 05:22 PM, Thomas Gleixner wrote:
On Tue, 2007-03-06 at 16:42 -0800, Dan Hecht wrote:
accounting would be wrong. Instead, we should allow the tick_sched_timer in cases (c) and (d) to have runtime configurable period, and then scale the time value accordingly before passing to account_system_time. This is probably something the Xen folks will want also, since I think Xen itself only gets 100hz hard timer, and so it can implement at best a oneshot virtual timer with 100hz resolution. Any objections to us doing something like this?
Yes. It's gross hackery.

1) We want to have a cleanup of the tick assumptions _all_ over the
place and this is going to be real hard work.

2) As I said above. The time accounting for virtualization needs to be
fixed in a generic way.

I'm not going to accept some weird hackery for virtualization, which is
of exactly ZERO value for the kernel itself. Quite the contrary it will
make the cleanup harder and introduce another hard to remove thing,
which will in the worst case last for ever.

Okay, to confirm I'm on the same page as you, you want to move process time accounting from being periodic sampled based to being trace based? i.e. at the system-call/interrupt boundaries, read clocksource and compute directly the amount of system/user/process time?

At least for the paravirt guests this is the correct approach. Once the
CPU vendors come up with a sane solution for a reliable and fast clock
source we might use that on real hardware as well.


I thought your preference was to not do things differently from real hardware? I guess this case you are okay with since you'd like to see the real hardware case follow eventually?

In any case, in paravirt the costs of reading timers and doing system call transitions are a bit different than on native, so we'll need to figure out what makes sense given those costs.

Do you know if anyone has explored this? I thought there was a discussion about this a while back but it was rejected due to the sample-based approach having much lower overheads on high system call rate workloads.

Yes, with todays hardware it is simply a PITA. PowerPC has some basic
support for this though, IIRC.


I think S390 maybe too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/