Re: [PATCH RFC] sched: deferred set priority (dprio)

From: Mike Galbraith
Date: Mon Jul 28 2014 - 03:24:56 EST


On Sun, 2014-07-27 at 18:19 -0700, Andi Kleen wrote:
> Sergey Oboguev <oboguev.public@xxxxxxxxx> writes:
>
> > [This is a repost of the message from few day ago, with patch file
> > inline instead of being pointed by the URL.]
>
> Have you checked out the preemption control that was posted some time
> ago? It did essentially the same thing, but somewhat simpler than your
> patch.
>
> http://lkml.iu.edu/hypermail/linux/kernel/1403.0/00780.html
>
> Yes I agree with you that lock preemption is a serious issue that needs solving.

Yeah, it's a problem, and well known.

One mitigation mechanism that exists in the stock kernel today is the
LAST_BUDDY scheduler feature. That took pgsql benchmarks from "shite"
to "shiny", and specifically targeted this issue.

Another is SCHED_BATCH, which can solve a lot of the lock problem by
eliminating wakeup preemption within an application. One could also
create an extended batch class which is not only immune from other
SCHED_BATCH and/or SCHED_IDLE tasks, but all SCHED_NORMAL wakeup
preemption. Trouble is that killing wakeup preemption precludes very
fast very light tasks competing with hogs for CPU time. If your load
depends upon these performing well, you have a problem.

Mechanism #3 is use of realtime scheduler classes. This one isn't
really a mitigation mechanism, it's more like donning a super suit.

So three mechanisms exist, the third being supremely effective, but high
frequency usage is expensive, ergo huge patch.

The lock holder preemption problem being identical to the problem RT
faces with kernel locks...

A lazy preempt implementation ala RT wouldn't have the SCHED_BATCH
problem, but would have a problem in that should critical sections not
be as tiny as it should be, every time you dodge preemption you're
fighting the fair engine, may pay heavily in terms of scheduling
latency. Not a big hairy deal, if it hurts, don't do that. Bigger
issue is that you have to pop into the kernel on lock acquisition and
release to avoid jabbering with the kernel via some public phone.
Popping into the kernel, if say some futex were victimized, also erases
the "f" in futex, and restricting cost to consumer won't be any easier.

The difference wrt cost acceptability is that the RT issue is not a
corner case, it's core issue resulting from the nature of the RT beast
itself, so the feature not being free is less annoying. A corner case
fix OTOH should not impact the general case at all.

Whatever outcome, I hope it'll be tiny. 1886 ain't tiny.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/