Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

From: Andrea Parri
Date: Fri Jul 13 2018 - 15:57:05 EST


On Fri, Jul 13, 2018 at 06:42:39PM +0200, Peter Zijlstra wrote:
>
> Hi Michael,
>
> On Fri, Jul 13, 2018 at 11:15:26PM +1000, Michael Ellerman wrote:
> > I reran some numbers today with some slightly updated tests.
> >
> > It varies quite a bit across machines and CPU revisions.
> >
> > On one I get:
> >
> > Lock/Unlock Time Time % Total Cycles Cycles Cycles Delta
> > lwsync/lwsync 79,290,859,955 100.0 % 290,160,065,087 145 -
> > lwsync/sync 104,903,703,237 132.3 % 383,966,199,430 192 47
> >
> > Another:
> >
> > Lock/Unlock Time Time % Total Cycles Cycles Cycles Delta
> > lwsync/lwsync 71,662,395,722 100.0 % 252,403,777,715 126 -
> > lwsync/sync 84,932,987,977 118.5 % 299,141,951,285 150 23
> >
> >
> > So 18-32% slower, or 23-47 cycles.
>
> Very good info. Note that another option is to put the SYNC in lock() it
> doesn't really matter which of the two primitives gets it. I don't
> suppose it really matters for timing either way around.
>
> > Next week I can do some macro benchmarks, to see if it's actually
> > detectable at all.
> >
> > The other question is how they behave on a heavily loaded system.
> >
> >
> > My personal preference would be to switch to sync, we don't want to be
> > the only arch finding (or not finding!) exotic ordering bugs.
> >
> > But we'd also rather not make our slow locks any slower than they have
> > to be.
>
> I completely understand, but I'll get you beer (lots) if you do manage
> to make SYNC happen :-) :-)

One trivia about seems due: it's of course very easy to stick a full
or a "tso" fence in one's spin_lock() implementation, or to tight the
semantics of such a primitive; removing this fence, or weakening the
semantics is another matter...

(/me reminding about that spin_is_locked() discussion...)

Andrea