Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()

From: Linus Torvalds
Date: Thu Jan 07 2010 - 12:50:24 EST

Next message: Oleg Nesterov: "Re: s390 && user_enable_single_step() (Was: odd utrace testingresults on s390x)"
Previous message: Mathieu Desnoyers: "Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier"
In reply to: Linus Torvalds: "Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()"
Next in thread: Peter Zijlstra: "Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 7 Jan 2010, Linus Torvalds wrote:
>
> Well, I have yet to hear a realistic scenario of _how_ to do it all
> speculatively in the first place, at least not without horribly subtle
> complexity issues. So I'd really rather see how far we can possibly get by
> just improving mmap_sem.

For an example of this: it's entirely possible that one avenue of mmap_sem
improvement would be to look at the _writer_ side, and see how that can be
improved.

An example of where we've done that is in madvise(): we used to always
take it for writing (because _some_ madvise versions needed the exclusive
access). And suddenly some operations got way more scalable, and work in
the presense of concurrent page faults.

And quite frankly, I'd _much_ rather look at that kind of simple and
logically fairly straightforward solutions, instead of doing the whole
speculative page fault work.

For example: there's no real reason why we take mmap_sem for writing when
extending an existing vma. And while 'brk()' is a very oldfashioned way of
doing memory management, it's still quite common. So rather than looking
at subtle lockless algorithms, why not look at doing the common cases of
an extending brk? Make that one take the mmap_sem for _reading_, and then
do the extending of the brk area with a simple cmpxchg or something?

And "extending brk" is actually a lot more common than shrinking it, and
is common for exactly the kind of workloads that are often nasty right now
(threaded allocators with lots and lots of smallish allocations)

The thing is, I can pretty much _guarantee_ that the speculative page
fault is going to end up doing a lot of nasty stuff that still needs
almost-global locking, and it's likely to be more complicated and slower
for the single-threaded case (you end up needing refcounts, a new "local"
lock or something).

Sure, moving to a per-vma lock can help, but it doesn't help a lot. It
doesn't help AT ALL for the single-threaded case, and for the
multi-threaded case I will bet you that a _lot_ of cases will have one
very hot vma - the regular data vma that gets shared for normal malloc()
etc.

So I'm personally rather doubtful about the whole speculative work. It's a
fair amount of complexity without any really obvious upside. Yes, the
mmap_sem can be very annoying, but nobody can really honestly claim that
we've really optimized it all that much.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Oleg Nesterov: "Re: s390 && user_enable_single_step() (Was: odd utrace testingresults on s390x)"
Previous message: Mathieu Desnoyers: "Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier"
In reply to: Linus Torvalds: "Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()"
Next in thread: Peter Zijlstra: "Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]