Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier (v5)

From: Peter Zijlstra
Date: Tue Jan 19 2010 - 13:39:30 EST


On Thu, 2010-01-14 at 14:33 -0500, Mathieu Desnoyers wrote:
> It's a case where CPU 1 switches from our mm to another mm:
>
> CPU 0 (membarrier) CPU 1 (another mm -our mm)
> <user-space> <user-space>
> <buffered access C.S. data>
> urcu read unlock()
> barrier()
> store local gp
> <kernel-space>

OK, so the question is how we end up here, if its though interrupt
preemption I think the interrupt delivery will imply an mb, if its a
blocking syscall, the set_task_state() mb [*] should be there.

Then we also do:

clear_tsk_need_resched()

which is an atomic bitop (although does not imply a full barrier
per-se).

> rq->curr = next (1)
> memory access before membarrier
> <call sys_membarrier()>
> smp_mb()
> mm_cpumask includes CPU 1
> rcu_read_lock()
> if (cpu_curr(1)->mm != our mm)
> skip CPU 1 -> here, rq->curr new version is already visible
> rcu_read_unlock()
> smp_mb()
> <return to user-space>
> memory access after membarrier
> -> this is where we allow freeing
> the old structure although the
> buffered access C.S. data is
> still in flight.
> User-space access C.S. data (2)
> (buffer flush)
> switch_mm()
> smp_mb()
> clear_mm_cpumask()
> set_mm_cpumask()
> smp_mb() (by load_cr3() on x86)
> switch_to()
> <buffered current = next>
> <switch back to user-space>
> current = next (1) (buffer flush)
> access critical section data (3)
>
> As we can see, the reordering of (1) and (2) is problematic, as it lets
> the check skip over a CPU that have global side-effects not committed to
> memory yet.

Right, this one I get, thanks!


So about that [*], Oleg, kernel/signal.c:SYSCALL_DEFINE0(pause) does:

SYSCALL_DEFINE0(pause)
{
current->state = TASK_INTERRUPTIBLE;
schedule();
return -ERESTARTNOHAND;
}

Isn't that ->state assignment buggy? If so, there appear to be quite a
few such sites, which worries me.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/