Re: [PATCH] sched,x86: optimize switch_mm for multi-threaded workloads

From: Rik van Riel
Date: Wed Jul 31 2013 - 18:16:22 EST


On 07/31/2013 06:07 PM, Linus Torvalds wrote:
On Wed, Jul 31, 2013 at 2:43 PM, Rik van Riel <riel@xxxxxxxxxx> wrote:

The cause turned out to be unnecessary atomic accesses to the
mm_cpumask. When in lazy TLB mode, the CPU is only removed from
the mm_cpumask if there is a TLB flush event.

Most of the time, no such TLB flush happens, and the kernel
skips the TLB reload. It can also skip the atomic memory
set & test.

The patch looks obvious, and I'm not seeing any very clear reasons for
why we would want that test-and-set to be atomic. That said, I'd like
to have some explicit comments about exactly why it doesn't need the
atomicity. Because afaik, there actually are concurrent readers _and_
writers of that mask, and the accesses are not locked by anything
here.
>
I _think_ the reason for this all being safe is simply that the only
real race is "We need to set the bit before we load the page table,
and we're protected against that bit being cleared because the TLB
state is TLBSTATE_OK and thus TLB flushing will no longer leave that
mm".

But damn, it all looks subtle as hell. That code does:

this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) != next);

if (!cpumask_test_and_set_cpu(cpu, mm_cpumask(next))) {

and I'm wondering if we need a barrier to make sure that that
TLBSTATE_OK write happens *before* we test the cpumask. With
test_and_set(), we have the barrier in the test-and-set. But with just
test_bit, I'm not seeing why the compiler couldn't re-order them. I
suspect it won't, but...

cpumask_set_bit expands to set_bit, which is atomic

Do we need any explicit compiler barrier in addition to the
atomic operation performed by set_bit?

I would be happy to rewrite the comment right above the
cpumask_set_cpu call if you want.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/