Re: [RFC PATCH 1/2] Fix: sched/membarrier: p->mm->membarrier_state racy load

From: Mathieu Desnoyers
Date: Wed Sep 04 2019 - 11:19:03 EST


----- On Sep 3, 2019, at 4:36 PM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx wrote:

> On Tue, Sep 3, 2019 at 1:25 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>
>> Why can't we frob this state into a line/word we already have to
>> unconditionally touch, like the thread_info::flags word for example.
>
> I agree, but we don't have any easily used flags left, I think.
>
> But yes, it would be better to not have membarrier always dirty
> another cacheline in the scheduler. So instead of
>
> atomic_set(&t->membarrier_state,
> atomic_read(&t->mm->membarrier_state));
>
> it migth be better to do something like
>
> if (mm->membarrier_state)
> atomic_or(&t->membarrier_state, mm->membarrier_state);
>
> or something along those lines - I think we've already brought in the
> 'mm' struct into the cache anyway, and we'd not do the write (and
> dirty the destination cacheline) for the common case of no membarrier
> usage.
>
> But yes, it would be better still if we can re-use some already dirty
> cache state.

Considering the alternative proposed by PeterZ, which is to iterate over
all processes/threads from an unprivileged process, I would be tempted
to put some more thoughts into the mm->membarrier_state cache-line. Do
we expect it to be typically hot ? Is there anything we can do to move
this field into a typically hot mm cacheline ?

I agree with your approach aiming to typically just load that field
(no store in the common case).

>
> I wonder if the easiest model might be to just use a percpu variable
> instead for the membarrier stuff? It's not like it has to be in
> 'struct task_struct' at all, I think. We only care about the current
> runqueues, and those are percpu anyway.

One issue here is that membarrier iterates over all runqueues without
grabbing any runqueue lock. If we copy that state from mm to rq on
sched switch prepare, we would need to ensure we have the proper
memory barriers between:

prior user-space memory accesses / setting the runqueue membarrier state

and

setting the runqueue membarrier state / following user-space memory accesses

Copying the membarrier state into the task struct leverages the fact that
we have documented and guaranteed those barriers around the rq->curr update
in the scheduler.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com