Re: [PATCH v2 0/6] Rework REFCOUNT_FULL using atomic_fetch_* operations

From: Will Deacon
Date: Fri Sep 06 2019 - 09:43:09 EST


On Wed, Aug 28, 2019 at 02:03:37PM -0700, Kees Cook wrote:
> On Wed, Aug 28, 2019 at 03:14:40PM +0100, Will Deacon wrote:
> > On Wed, Aug 28, 2019 at 09:30:52AM +0200, Peter Zijlstra wrote:
> > > On Tue, Aug 27, 2019 at 05:31:58PM +0100, Will Deacon wrote:
> > > > Will Deacon (6):
> > > > lib/refcount: Define constants for saturation and max refcount values
> > > > lib/refcount: Ensure integer operands are treated as signed
> > > > lib/refcount: Remove unused refcount_*_checked() variants
> > > > lib/refcount: Move bulk of REFCOUNT_FULL implementation into header
> > > > lib/refcount: Improve performance of generic REFCOUNT_FULL code
> > > > lib/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
>
> BTW, can you repeat the timing details into the "Improve performance of
> generic REFCOUNT_FULL code" patch?

Of course.

> > > So I'm not a fan; I itch at the whole racy nature of this thing and I
> > > find the code less than obvious. Yet, I have to agree it is exceedingly
> > > unlikely the race will ever actually happen, I just don't want to be the
> > > one having to debug it.
> >
> > FWIW, I think much the same about the version under arch/x86 ;)
> >
> > > I've not looked at the implementation much; does it do all the same
> > > checks the FULL one does? The x86-asm one misses a few iirc, so if this
> > > is similarly fast but has all the checks, it is in fact better.
> >
> > Yes, it passes all of the REFCOUNT_* tests in lkdtm [1] so I agree that
> > it's an improvement over the asm version.
> >
> > > Can't we make this a default !FULL implementation?
> >
> > My concern with doing that is I think it would make the FULL implementation
> > entirely pointless. I can't see anybody using it, and it would only exist
> > as an academic exercise in handling the theoretical races. That's a change
> > from the current situation where it genuinely handles cases which the
> > x86-specific code does not and, judging by the Kconfig text, that's the
> > only reason for its existence.
>
> Looking at timing details, the new implementation is close enough to the
> x86 asm version that I would be fine to drop the x86-specific case
> entirely as long as we could drop "FULL" entirely too -- we'd have _one_
> refcount_t implementation: it would be both complete and fast.

That works for me; I'll spin a new version of this series so you can see
what it looks like.

> However, I do think a defconfig image size comparison should be done as
> part of that too. I think this implementation will be larger than the
> x86 asm one (but not by any amount that I think is a problem).

I've managed to get it down to +0.5% when comparing an x86_64 defconfig
before these changes, to one afterwards with REFCOUNT_FULL enabled:

Total: Before=14762076, After=14835497, chg +0.50%

Will