Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

From: Nick Piggin
Date: Thu Aug 28 2008 - 08:04:14 EST


On Thursday 28 August 2008 20:54, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
> > On Wednesday 27 August 2008 08:49, Andi Kleen wrote:
> > > Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:
> > > > Well, we might have a public opinion poll, whether a system is
> > > > declared frozen after 1, 10 or 100 seconds. Even a one second
> > > > unresponsivness shows up on the kernel bugzilla and you request that
> > > > unlimited unresponsivness w/o a chance to debug it is the sane
> > > > default.
> > >
> > > That assumes single CPU. With multiple CPUs and not
> > > all hogged the system should be still responsive?
> >
> > Right.
>
> Wrong.
>
> Even if the system has multiple CPUs, and even if just a single CPU is
> fully utilized by an RT task, without the rt-limit the system will still
> lock up in practice due to various other factors: workqueues and tasks
> being 'stuck' on CPUs that host an RT hog. While there's obviously CPU
> time available on other CPUs, you cannot run 'top', the desktop will
> freeze, work flows of the system can be stuck, etc, etc..

No, it is right. With caveats. Because you can pretty well isolate a
CPU from running kernel threads or work. At any rate, I don't think it
is your decision to just mandate this.


> With the rt limit in place, it's all pretty smooth and debuggable. Even
> with all CPUs hogged by SCHED_FIFO prio 99 the system is laggy but
> debuggable - the user can run 'top' and can resolve the situation.

When I write rt apps, I run a watchdog thread which detects a hang
task and kills it.


> Really, this reply of yours shows something startling: that despite this
> many mails you still have never actually tried to run the scenario you
> are complaining about: you have never tried to run a CPU hog high-prio
> RT task on a Linux system before, and you have never observed the
> effects it has on general system stability and debuggability.

Of course I have and of course I know what it does if you run a
for (;;) rt thread on an ordinary Linux desktop system. Trying to
"fix" that for people is not a good reason to break the API.


> This fundamental lack of experience weakens all your arguments and i
> dont even know why you are arguing about it. Do you perhaps have some
> customer application/workload you are worried about? If you have then
> please tell us about the exact specifics - this handwaving about
> compliance really makes little sense.

You're continually ignoring all of my arguments and instead raising
irrelvant things like this.

You ignored others in this thread who replied with real uses of the
rt scheduling that is being prevented by this API breakage, and
you're ignoring my examples of how it could be used and just keep
asserting that "anybody who does that is broken anyway".

You also ignored when I told you how you can fix this correctly by
introducing new SCHED_xxx scheduling policies that won't break
backwards compatibility and will be defined from the outset to be
throttled as such.

There is no customer issue and there is no handwaving about compliance;
it is a black and white issue: this behaviour breaks all documentation,
previous Linux behaviour, other systems.


> In other words: in our car the air-bag continues to be enabled by
> default, and if someone wants to use the car for stunts the air-bag can
> be disabled via that handy sysctl.

How am I supposed to respond to that? My car doesn't have an air bag
but it's breaks don't stop working every 10 seconds.

Can we stop with the car and gun analogies now?


> In any case i think i'm going to ignore this thread from now on, nothing
> new has been said really, just the general tone of discussion is
> deteriorating.

OK, if you don't wish to have further discussion then I will submit a
patch to Linus and I'll see what he says.


> You are also very late with raising objections in any
> case - the rt-limit feature has been posted 10 months ago and went
> upstream 8 months ago - two full kernel cycles have been completed with
> this change in place and a third one has almost been finished.

So what?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/