Re: [patchlet] sched: fix rt throttle runtime borrowing

From: Mike Galbraith
Date: Tue Mar 08 2011 - 09:25:28 EST


On Tue, 2011-03-08 at 14:46 +0100, Peter Zijlstra wrote:
> On Tue, 2011-03-08 at 14:27 +0100, Mike Galbraith wrote:
>
> > > Also, how much of a problem is it really? When I start a FIFO spinner on
> > > my machine I can still ssh in and kill the thing.
> >
> > It's a problem if you have one box. Also, try starting a hefty load
> > then having an rt task go nuts. Nothing good happens here.
>
> Right, so I think we're not aggressive enough to migrate tasks away from
> very small cpu_power CPUs, trapping tasks on such CPUs.

I don't think that's it at all. I just tried (again) with virgin tip.
Start a kbuild, start an RT hog. Instant frozen box. If you're in a
console shell, you may or may not save the box, but the desktop is
instantly toast, and there is no ssh possibility. I can ping the box,
but that's it. It's a pingable doorstop.

> Of course, this is no help for pinned tasks.. but then you get what you
> asked for isn't it ;-)

But events/N are pinned, and kinda critical.

> > > Not allowing 100% FIFO usage on SMP is going to make it very very hard
> > > to implement any kind of fifo-cgroup stuff.
> >
> > The only thing I care much about is the default setup. The safety net
> > should work, otherwise it's a waste.
>
> Right, but how much trouble can be avoided by making the sched_fair
> load-balancer migrate tasks away from very small cpu_power CPUs?

I don't think that's the issue. I think when any events is blocked
forever, it's game over.

> It won't avoid actual deadlocks when someone tries to wait for workqueue
> broadcasts and the like, but how much of that is actually happening?

Mmmmm.. enough to kill my box every time I test? :)

> > Maybe only doing the borrow thing when there are active RT groups is the
> > right thing to do. (minus knob)
>
> Thing is the whole borrowing needs to go, Dario and me finally came up
> with a 'sane' way to implement fifo-cgroups, but that does include
> explicitly allowing starving CPUs.

Hm, borrowing going away sounds great. Dunno about that starving CPUs
bit, that has never led to anything but BRB poking here.

> Not allowing that very quickly degenerates into massive trouble like
> gang-scheduling or bouncing tasks around like mad and generally messing
> up the 'load-balancer'.

If the problematic code is going away anyway, I'll just leave it. It's
the same problem that has existed since the dawn of time. RT hogs can
be utterly deadly a while longer I suppose :)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/