Re: [PATCH 26/31] sched, numa, mm: Add fault driven placement andmigration policy

From: Linus Torvalds
Date: Fri Oct 26 2012 - 12:47:39 EST


On Fri, Oct 26, 2012 at 7:14 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
>>
>> Shouldn't the pte_lock serialize all that still? All sites
>> that modify PTE contents should hold the pte_lock (and do
>> afaict).
>
> Hm, indeed.
>
> Is there no code under down_read() (in the page fault path) that
> modifies the pte via just pure atomics?

Well, the ptep_set_access_flags() thing modifies the pte under
down_read(). Not using atomics, though. If it races with itself or
with a hardware page walk, that's fine, but if it races with something
changing other bits than A/D, that would be horribly horribly bad - it
could undo any other bit changes exactly because it's a unlocked
read-do-other-things-write sequence.

But it's always run under the page table lock - as should all other SW
page table modifications - so it *should* be fine. The down_read() is
for protecting other VM data structures (notably the vma lists etc),
not the page table bit-twiddling.

In fact, the whole SW page table modification scheme *depends* on the
page table lock, because the ptep_modify_prot_start/commit thing does
a "atomically clear the page table pointer to protect against hardware
walkers". And if another software walker were to see that cleared
state, it would do bad things (the exception, as usual, is the GUP
code, which does the optimistic unlocked accesses and conceptually
emulates a hardware page table walk)

So I really think that the mmap_sem should be entirely a non-issue for
this kind of code.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/