Re: [PATCH v3 3/7] sched: throttle cfs_rq entities which exceed theirlocal quota

From: Paul Turner
Date: Thu Oct 14 2010 - 06:26:14 EST


On Thu, Oct 14, 2010 at 3:08 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> On Thu, 14 Oct 2010 11:59:55 +0200
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>> On Thu, 2010-10-14 at 18:50 +0900, KAMEZAWA Hiroyuki wrote:
>> > On Thu, 14 Oct 2010 11:12:22 +0200
>> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> >
>> > > On Wed, 2010-10-13 at 15:34 +0900, KAMEZAWA Hiroyuki wrote:
>> > > > cpu.share and bandwidth control can't be used simultaneously or...
>> > > > is this fair ? I'm not familiar with scheduler but this allows boost this tg.
>> > > > Could you add a brief documentaion of a spec/feature. in the next post ?
>> > >
>> > > Like explained, shares control the proportional distribution of time
>> > > between groups, bandwidth puts a limit on how much time a group can
>> > > take. It can cause a group to receive less than its fair share, but
>> > > never more.
>> > >
>> > > There is, however, a problem with all this, and that is that all this
>> > > explicit idling of tasks can lead to a form of priority inversion.
>> > > Regular preemptive scheduling already suffers from this, but explicitly
>> > > idling tasks exacerbates the situation.
>> > >
>> > > You basically get to add the longest induced idle time to all your lock
>> > > hold times.
>> > >
>> >
>> > What is the user-visible difference of the problem between
>> >   1) limit share to be very small.
>> >   2) use throttole.
>> >
>> > If share is used, lock-hodler's priority is boosted ?
>>
>> No, both lead to the same problem, its just that this adds another
>> dimension to it.. and I'm fairly sure people won't realise this until it
>> bites them in the ass.
>>
> Hmm, them, existing problem but this add a new pitfall.
>
> What's your recomendation to make progess on this work ?
>
> I think 1st step will be..
> - explain the problem of priority inversion in cgroup+cfs documenation with
>  !!CAUTION!!
>
> I'm sorry I'm not sure there have been trials for fixing priority inversion
> in the linux scheduler development.
>
> Explaining my motivation, a user of this feature on my customer is virtual machine
> rental service. So, some fuctionality as
> "When vcpu holds spinlock in kernel, please don't sleep.." will be nice.
> Is there patch already ?
>


Per above:

When a group exceeds its bandwidth we don't actively force it off the
cpu, we only set TIF_RESCHED; we won't process the throttling until we
drop back down to userspace and handle the flag.

This means: we'll never throttle a spinlock

We'll also only throttle a sleepable lock (that doesn't disable
preemption) when they voluntarily reschedule without releasing the
lock, at which point they've chosen to open themselves to an arbitrary
latency period anyway.

**

The case of a guest cpu holding spinlocks is part of a much larger
rabbit hole that is spinlock enlightenment which should occur via
pvops/etc interaction. The sane thing for this to do would be to (at
least) preempt_disable() at which point the vcpu will be protected
from throttling.

This seems somewhat orthogonal to this patchset however.

**

Agreed that PI inversion across threads and across vcpus are rather
sickly beasts; especially given how bare the curtains are on the first
case (which the second can only really build upon).

>
> Thanks,
> -Kame
>
>
>
>
>
>
>
>
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/