Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vsunpinnede

From: Srivatsa Vaddagiri
Date: Tue Sep 13 2011 - 13:42:06 EST


* Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> [2011-09-13 18:33:09]:

> On Tue, 2011-09-13 at 21:51 +0530, Srivatsa Vaddagiri wrote:
> > > which increases the time you force a task to sleep that's holding locks etc..
> >
> > Ideally all tasks should get capped at the same time, given that there is
> > a global pool from which everyone pulls bandwidth? So while one vcpu/task
> > (holding a lock) gets capped, other vcpus/tasks (that may want the same lock)
> > should ideally not be running for long after that, avoiding lock inversion
> > related problems you point out.
>
> No this simply cannot be true.. You force groups to sleep so that other
> groups can run, right? Therefore shared kernel locks will cause
> inversion.

Ah ..shared locks of "host" kernel ..true ..that can still cause
lock-inversion yes.

I had in mind user-space (or "guest" kernel) locks - which can't get inverted
that easily (one of cgroup's tasks wanting a "userspace" lock which is held by
another "throttled" task of same cgroup - causing a inversion problem of sorts).
My point was that once a task gets throttled, other sibling tasks should get
throttled almost immediately after that (given that bandwidth for a cgroup is
maintained in a global pool from which everyone draws in "small" increments) -
so a task that gets capped while holding a user-space lock should not
result in other sibling tasks going too much hungry on held locks within the
same period?

> You cannot put both groups to sleep and still expect a utilization of
> 100%.
>
> Simple example, some task in group A owns the i_mutex of a file, group A
> runs out of time and gets dequeued. Some other task in group B needs
> that same i_mutex.
>
> > I guess that we may still run into that with current implementation ..
> > Basically global pool may have zero runtime left for current period,
> > forcing a vcpu/task to be throttled, while there is surplus runtime in
> > per-cpu pools, allowing some sibling vcpus/tasks to run for wee bit
> > more, leading to lock-inversion related problems (more idling). That
> > makes me think we can improve directed yield->capping interaction.
> > Essentially when the target task of directed yield is capped, can the
> > "yielding" task donate some of its bandwidth?
>
> What moron ever calls yield anyway?

I meant directed yield (yield_to) ..which is used by KVM when it detects
pause-loops. Essentially, a vcpu spinning in guest-kernel context for too long
leading to PLE (Pasue-Loop-Exit), which leads to KVM driver doing a directed
yield to another sibling vcpu ..so the target of directed yield may be a
capped vcpu task, in which case was wondering if directed yield can donate
bit of bandwidth to the throttled task. Again going by what I said earlier about
tasks getting capped more or less at same time, this should occur very
infrequently ...something for me to test and find out nevertheless!

> If you use yield you're doing it wrong!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/