Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12)

From: Peter Zijlstra
Date: Mon Mar 16 2015 - 16:55:44 EST


On Mon, Mar 16, 2015 at 06:53:35PM +0000, Mathieu Desnoyers wrote:
> > I'm not entirely awake atm but I'm not seeing why it would need to be
> > that strict; I think the current single MB on task switch is sufficient
> > because if we're in the middle of schedule, userspace isn't actually
> > running.
> >
> > So from the point of userspace the task switch is atomic. Therefore even
> > if we do not get a barrier before setting ->curr, the expedited thing
> > missing us doesn't matter as userspace cannot observe the difference.
>
> AFAIU, atomicity is not what matters here. It's more about memory ordering.
> What is guaranteeing that upon entry in kernel-space, all prior memory
> accesses (loads and stores) are ordered prior to following loads/stores ?
>
> The same applies when returning to user-space: what is guaranteeing that all
> prior loads/stores are ordered before the user-space loads/stores performed
> after returning to user-space ?

You're still one step ahead of me; why does this matter?

Or put it another way; what can go wrong? By virtue of being in
schedule() both tasks (prev and next) get an affective MB from the task
switch.

So even if we see the 'wrong' rq->curr, that CPU will still observe the
MB by the time it gets to userspace.

All of this is really only about userspace load/store ordering and the
context switch already very much needs to guarantee userspace program
order in the face of context switches.

> > > In order to be able to dereference rq->curr->mm without holding the
> > > rq->lock, do you envision we should protect task reclaim with RCU-sched ?
> >
> > A recent discussion had Linus suggest SLAB_DESTROY_BY_RCU, although I
> > think Oleg did mention it would still be 'interesting'. I've not yet had
> > time to really think about that.
>
> This might be an "interesting" modification. :) This could perhaps come
> as an optimization later on ?

Not really, again, take this for (;;) sys_membar(EXPEDITED) that'll
generate horrendous rq lock contention, with or without the PRIVATE
thing it'll pound a number of rq locks real bad.

Typical scheduler syscalls only affect a single rq lock at a time -- the
one the task is on. This one potentially pounds all of them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/