Re: [patch] increase spinlock-debug looping timeouts (write_lock and NMI)

From: Ingo Molnar
Date: Tue Jun 20 2006 - 05:55:20 EST



* Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:

> Ingo Molnar wrote:
> >curious, do you have any (relatively-) simple to run testcase that
> >clearly shows the "scalability issues" you mention above, when going
> >from rwlocks to spinlocks? I'd like to give it a try on an 8-way box.
>
> Arjan van de Ven wrote:
> > I'm curious what scalability advantage you see for rw spinlocks vs real
> > spinlocks ... since for any kind of moderate hold time the opposite is
> > expected ;)
>
> It actually surprised me too, but Peter Chubb (who IIRC provided the
> motivation to merge the patch) showed some fairly significant
> improvement at 12-way.
>
> https://www.gelato.unsw.edu.au/archives/scalability/2005-March/000069.html

i think that workload wasnt analyzed well enough (by us, not by Peter,
who sent a reasonable analysis and suggested a reasonable change), and
we went with whatever magic change appeared to make a difference,
without fully understanding the underlying reasons. Quote:

"I'm not sure what's happening in the 4-processor case."

Now history appears to be repeating itself, just in the other direction
;) And we didnt get one inch closer to understanding the situation for
real. I'd vote for putting a change-moratorium on tree-lock and only
allow a patch that tweaks it that fully analyzes the workload :-)

one thing off the top of my mind: doesnt lockstat introduce significant
overhead? Is this reproducable with lockstat turned off too? Is the same
scalability problem visible if all read_lock()s are changed to
write_lock()? [like i did in my patch] I.e. can other explanations (like
unlucky alignment of certain rwlock data structures / functions) be
excluded.

another thing: average hold times in the spinlock case on that workload
are below 1 microsecond - probably on the range of cachemiss bounce
costs on such a system. I.e. it's the worst possible case for a
spinlock->rwlock conversion! The only reason i can believe this to make
a difference are cycle level races and small random micro-differences
that cause heavier bouncing in the spinlock workload but happen to avoid
it in the read-lock case. Not due to any fundamental advantage of
rwlocks.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/