Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12)

From: Mathieu Desnoyers
Date: Mon Mar 16 2015 - 14:53:48 EST


----- Original Message -----
> From: "Peter Zijlstra" <peterz@xxxxxxxxxxxxx>
> To: "Mathieu Desnoyers" <mathieu.desnoyers@xxxxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx, "KOSAKI Motohiro" <kosaki.motohiro@xxxxxxxxxxxxxx>, "Steven Rostedt"
> <rostedt@xxxxxxxxxxx>, "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>, "Nicholas Miell" <nmiell@xxxxxxxxxxx>,
> "Linus Torvalds" <torvalds@xxxxxxxxxxxxxxxxxxxx>, "Ingo Molnar" <mingo@xxxxxxxxxx>, "Alan Cox"
> <gnomes@xxxxxxxxxxxxxxxxxxx>, "Lai Jiangshan" <laijs@xxxxxxxxxxxxxx>, "Stephen Hemminger"
> <stephen@xxxxxxxxxxxxxxxxxx>, "Andrew Morton" <akpm@xxxxxxxxxxxxxxxxxxxx>, "Josh Triplett" <josh@xxxxxxxxxxxxxxxx>,
> "Thomas Gleixner" <tglx@xxxxxxxxxxxxx>, "David Howells" <dhowells@xxxxxxxxxx>, "Nick Piggin" <npiggin@xxxxxxxxx>
> Sent: Monday, March 16, 2015 1:21:04 PM
> Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12)
>
> On Mon, Mar 16, 2015 at 03:43:56PM +0000, Mathieu Desnoyers wrote:
> > > On which; I absolutely hate that rq->lock thing in there. What is
> > > 'wrong' with doing a lockless compare there? Other than not actually
> > > being able to deref rq->curr of course, but we need to fix that anyhow.
> >
> > If we can make sure rq->curr deref could be done without holding the rq
> > lock, then I think all we would need is to ensure that updates to rq->curr
> > are surrounded by memory barriers. Therefore, we would have the following:
> >
> > * When a thread is scheduled out, a memory barrier would be issued before
> > rq->curr is updated to the next thread task_struct.
> >
> > * Before a thread is scheduled in, a memory barrier needs to be issued
> > after rq->curr is updated to the incoming thread.
>
> I'm not entirely awake atm but I'm not seeing why it would need to be
> that strict; I think the current single MB on task switch is sufficient
> because if we're in the middle of schedule, userspace isn't actually
> running.
>
> So from the point of userspace the task switch is atomic. Therefore even
> if we do not get a barrier before setting ->curr, the expedited thing
> missing us doesn't matter as userspace cannot observe the difference.

AFAIU, atomicity is not what matters here. It's more about memory ordering.
What is guaranteeing that upon entry in kernel-space, all prior memory
accesses (loads and stores) are ordered prior to following loads/stores ?

The same applies when returning to user-space: what is guaranteeing that all
prior loads/stores are ordered before the user-space loads/stores performed
after returning to user-space ?

>
> > In order to be able to dereference rq->curr->mm without holding the
> > rq->lock, do you envision we should protect task reclaim with RCU-sched ?
>
> A recent discussion had Linus suggest SLAB_DESTROY_BY_RCU, although I
> think Oleg did mention it would still be 'interesting'. I've not yet had
> time to really think about that.

This might be an "interesting" modification. :) This could perhaps come
as an optimization later on ?

By the way, I now remember why we start from the mm_cpumask, and then
double-check the mm: using the mm_cpumask serves as an approximation
of the CPUs we need to double-check. Therefore, rather than grabbing
the rq lock for all CPUs, we only need to grab it for CPUs that are
in the mm_cpumask.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/