Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

From: Ingo Molnar
Date: Sat Sep 28 2013 - 15:21:35 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Sat, Sep 28, 2013 at 12:41 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> >
> > Yeah, I fully agree. The reason I'm still very sympathetic to Tim's
> > efforts is that they address a regression caused by a mechanic
> > mutex->rwsem conversion:
> >
> > 5a505085f043 mm/rmap: Convert the struct anon_vma::mutex to an rwsem
> >
> > ... and Tim's patches turn that regression into an actual speedup.
>
> Btw, I really hate that thing. I think we should turn it back into a
> spinlock. None of what it protects needs a mutex or an rwsem.
>
> Because you guys talk about the regression of turning it into a rwsem,
> but nobody talks about the *original* regression.
>
> And it *used* to be a spinlock, and it was changed into a mutex back in
> 2011 by commit 2b575eb64f7a. That commit doesn't even have a reason
> listed for it, although my dim memory of it is that the reason was
> preemption latency.

Yeah, I think it was latency.

> And that caused big regressions too.
>
> Of course, since then, we may well have screwed things up and now we
> sleep under it, but I still really think it was a mistake to do it in
> the first place.
>
> So if the primary reason for this is really just that f*cking anon_vma
> lock, then I would seriously suggest:
>
> - turn it back into a spinlock (or rwlock_t, since we subsequently
> separated the read and write paths)
>
> - fix up any breakage (ie new scheduling points) that exposes
>
> - look at possible other approaches wrt latency on that thing.
>
> Hmm?

If we do that then I suspect the next step will be queued rwlocks :-/ The
current rwlock_t implementation is rather primitive by modern standards.
(We'd probably have killed rwlock_t long ago if not for the
tasklist_lock.)

But yeah, it would work and conceptually a hard spinlock fits something as
lowlevel as the anon-vma lock.

I did a quick review pass and it appears nothing obvious is scheduling
with the anon-vma lock held. If it did in a non-obvious way it's likely a
bug anyway. The hugepage code grew a lot of logic running under the
anon-vma lock, but it all seems atomic.

So a conversion to rwlock_t could be attempted. (It should be relatively
easy patch as well, because the locking operation is now nicely abstracted
out.)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/