Re: rcuref optimization

From: Joe Seigh
Date: Thu Dec 22 2005 - 11:14:50 EST


Nick Piggin wrote:
Joe Seigh wrote:

You can get rid of the requirement for atomic_inc_not_zero logic
if you use the logic I first proposed here in c.l.c++.m.
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=3E7C83DD.B126DE24%40xemaps.com

for weakptrs where the same kind of logic was required for the strong count.
This will allow you to use fetch_inc (e.g. LOCK INC on x86) instead of compare
and swap logic which might be more efficient on some processors. You might
even be able to get rid of the the "unincrement" if you are pretty sure the
maximum number of increments won't put the refcount to zero.


Clever idea.

I don't know... atomic_inc_not_zero is implemented very easily on the
many architectures without SMP, and I think it *could* be implemented
very nicely on ll/sc based architectures without using cmpxchg.

Lastly, your InterlockedIncrement and InterlockedDecrement are not
actually atomic_inc (LOCK INC), but atomic_inc_return (XADD). Another
primitive like atomic_inc_return_negative or something could be added
to take advantages of status flags and use LOCK INC, but this will
probably not be worthwhile for any architecture other than i386/x86-64
(ie. it will be plain worse on most ll/sc and UP-only architectures
once they get around to implementing atomic_inc_not_zero properly)

In other words, if you *don't* implement it, it will be plain worse
on architectures that do have native atomic_inc. (sorry, couldn't resist :) )

Also, the extra logic and atomic op in the decrement-to-zero case
takes a bit of shine off it even for i386. I'd say we should stick
to what we have unless we see some really compelling numbers.

The extra atomic op should be on a somewhat infrequently executed path
if you're in a typical readers/writers situation, which I think is the
case. Probably RCU lookups that return by reference rather than by value.

On the read side, the atomic increment might make a difference with less
conditional branches needed but it probably matters more what the overall
contribution to performance is. If the refcount increment is only 1% of
overall cost then even if an atomic increment is 2 or 3 times faster than
the present scheme, it probably wouldn't matter that much.

That said, the refcount can be considered an opaque data type and if you
don't hard code the implementation into your api naming scheme, you can
use the present implementation and alter some platform implementations
later if it did look like there was an advantage. E.g., go back to
rcuref_inc/dec and have them return success/fail instead of the value
of the refcount so you can keep it as an opaque data type.

I may even go add GC guarded refcounting to my atomic-ptr-plus project.
I hadn't done it before since I didn't have a good RCU for preemptive
user threads implementation and besides I already had atomically threadsafe
refcounted smart pointers that didn't depend on some other form of GC.
But it occurs to me that it will work fine with the proxy GC and with
the RCU+SMR where you might want to keep even longer term references
than would be feasible to use a hazard pointer for.


--
Joe Seigh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/