Re: [PATCH 2/4] sched: Document Program-Order guarantees

From: Paul Turner
Date: Mon Nov 02 2015 - 17:09:56 EST


On Mon, Nov 2, 2015 at 12:34 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, Nov 02, 2015 at 12:27:05PM -0800, Paul Turner wrote:
>> I suspect this part might be more explicitly expressed by specifying
>> the requirements that migration satisfies; then providing an example.
>> This makes it easier for others to reason about the locks and saves
>> worrying about whether the examples hit our 3 million sub-cases.
>>
>> I'd also propose just dropping preemption from this part, we only need
>> memory order to be correct on migration, whether it's scheduled or not
>> [it also invites confusion with the wake-up case].
>>
>> Something like:
>> When any task 't' migrates, all activity on its prior cpu [c1] is
>> guaranteed to be happens-before any subsequent execution on its new
>> cpu [c2]. There are 3 components to enforcing this.
>>
>> [c1] 1) Sched-out of t requires rq(c1)->lock
>> [any cpu] 2) Any migration of t, by any cpu is required to synchronize
>> on *both* rq(c1)->lock and rq(c2)->lock
>> [c2] 3) Sched-in of t requires cq(c2)->lock
>>
>> Transitivity guarantees that (2) orders after (1) and (3) after (2).
>> Note that in some cases (e.g. active, or idle cpu) the balancing cpu
>> in (2) may be c1 or c2.
>>
>> [Follow example]
>
> Make sense, I'll try and reword things like that.
>
> Note that in don't actually need the strong transitivity here (RCsc),
> weak transitivity (RCpc) is in fact sufficient.

Yeah, I thought about just using acquire/release to talk about the
interplay, in particular with the release in (1) in release and
acquire from (3) which would make some of this much more explicit and
highlight that we only need RCpc. We have not been very consistent at
using this terminology, although this could be a good starting point.

If we went this route, we could do something like:

+ * So in this case the scheduler does not provide an obvious full barrier; but
+ * the smp_store_release() in finish_lock_switch(), paired with the control-dep
+ * and smp_rmb() in try_to_wake_up() form a release-acquire pair and fully
+ * order things between CPU0 and CPU1.

Instead of having this, which is complete, but hard to synchronize
with the points at which it actually matters. Just use acquire and
release above, then at the actual site, e.g. in try_to_wake_up()
document how we deliver the acquire required by the higher level
documentation/requirements.

This makes it easier to maintain the stupidly racy documentation
consistency property in the future.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/