Re: Performance regression from switching lock to rw-sem foranon-vma tree

From: Ingo Molnar
Date: Tue Jul 30 2013 - 15:25:09 EST

Next message: Winkler, Tomas: "RE: [char-misc 3.11 V2] mei: me: fix hardware reset flow"
Previous message: James Bottomley: "Re: [PATCH 25/36] SCSI: sd: convert class code to use dev_groups"
In reply to: Tim Chen: "Re: Performance regression from switching lock to rw-sem foranon-vma tree"
Next in thread: Davidlohr Bueso: "Re: Performance regression from switching lock to rw-sem foranon-vma tree"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote:

> On Tue, 2013-07-23 at 11:53 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > > On Tue, Jul 23, 2013 at 11:45:13AM +0200, Ingo Molnar wrote:
> > >
> > > > Why not just try the delayed addition approach first? The spinning is
> > > > time limited AFAICS, so we don't _have to_ recognize those as writers
> > > > per se, only if the spinning fails and it wants to go on the waitlist.
> > > > Am I missing something?
> > > >
> > > > It will change patterns, it might even change the fairness balance -
> > > > but is a legit change otherwise, especially if it helps performance.
> > >
> > > Be very careful here. Some people (XFS) have very specific needs. Walken
> > > and dchinner had a longish discussion on this a while back.
> >
> > Agreed - yet it's worth at least trying it out the quick way, to see the
> > main effect and to see whether that explains the performance assymetry and
> > invest more effort into it.
> >
>
> Ingo & Peter,
>
> Here's a patch that moved optimistic spinning of the
> writer lock acquisition before putting the writer on the
> wait list. It did give me a 5% performance boost on my
> exim mail server workload. It recovered a good portion of
> the 8% performance regression from mutex implementation.

Very nice detective work!

> I think we are on the right track. Let me know if there's
> anything in the patch that may cause grief to XFS.
>
> There is some further optimization possible. We went to
> the failed path within __down_write if the count field is
> non zero. But in fact if the old count field was
> RWSEM_WAITING_BIAS, there's no one active and we could
> have stolen the lock when we perform our atomic op on the
> count field in __down_write. Yet we go to the failed path
> in the current code.
>
> I will combine this change and also Alex's patches on
> rwsem together in a patchset later.
>
> Your comments and thoughts are most welcomed.
>
> Tim
>
> Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>

> +config RWSEM_SPIN_ON_WRITE_OWNER
> + bool "Optimistic spin write acquisition for writer owned rw-sem"
> + default n
> + depends on SMP
> + help
> + Allows a writer to perform optimistic spinning if another writer own
> + the read write semaphore. If the lock owner is running, it is likely
> + to release the lock soon. Spinning gives a greater chance for writer to
> + acquire a semaphore before putting it to sleep.

The way you've worded this new Kconfig option makes it
sound as if it's not just for testing, and I'm not a big
believer in extra Kconfig degrees of freedom for
scalability features of core locking primitives like
rwsems, in production kernels ...

So the bad news is that such scalability optimizations
really need to work for everyone.

The good news is that I don't think there's anything
particularly controversial about making the rwsem write
side perform just as well as mutexes - it would in fact be
a very nice quality of implementation feature: it gives
people freedom to switch between mutexes and rwsems without
having to worry about scalability differences too much.

Once readers are mixed into the workload can we keep the
XFS assumptions, if they are broken in any way?

We are spinning here so we have full awareness about the
state of the lock and we can react to a new reader in very
short order - so at a quick glance I don't see any
fundamental difficulty in being able to resolve it - beyond
the SMOP aspect that is ... :-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Winkler, Tomas: "RE: [char-misc 3.11 V2] mei: me: fix hardware reset flow"
Previous message: James Bottomley: "Re: [PATCH 25/36] SCSI: sd: convert class code to use dev_groups"
In reply to: Tim Chen: "Re: Performance regression from switching lock to rw-sem foranon-vma tree"
Next in thread: Davidlohr Bueso: "Re: Performance regression from switching lock to rw-sem foranon-vma tree"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]