Re: [RFC] Lock free fd lookup

From: William Lee Irwin III
Date: Fri Jul 16 2004 - 20:22:40 EST


On Thu, 15 Jul 2004 23:27:32 -0700, William Lee Irwin III wrote:
>> The gist of all this is that busywait-free atomic updates are only
>> implementable for data structures that don't link through the object
>> but rather maintain an external index with a single pointer to elements
>> needing updates, like radix trees, B+ trees, arrays of pointers, and
>> open-addressed hashtables.

On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> There are algorithms for busywait-free (lock free) traversal and
> concurrent update of lists and structures that contain embedded
> pointers. I once had to implement one on an O/S where the user space
> to kernel wait/alarm semantics could not satisfy the timing
> requirements of the calling code (don't ask).

Yes. I had in mind only RCU-based algorithms. Throwing more machinery at
the problem may get further.


On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> The beauty of these lockfree algorithms on large structures is that
> nothing ever stalls indefinitely. If the underlying SMP cache hardware
> is fair, everything running a lockfree list will make progress. These
> algorithms do not suffer from reader vs. writer starvation.

That's a large assumption. NUMA hardware typically violates it.


On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> The downside is that the algorithms rely on an extra sequence field per
> word. They copy out the structure, modify the local copy, write back
> with an update of sequence field. Writing back detects if any other
> update has invalidated the current structure, at which point the second
> update is discarded and the writer redrives its update. Readers are
> guaranteed to see a consistent view of the copy, even if the master
> structure is being updated at the same time as it is being read.

I would classify this as "ticket-based".


On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> Bottom line, it can be done but ...
> * The structure size doubles with the addition of the sequence fields.
> * The hardware must support cmpxchg on the combination of the sequence
> field and the data word that it protects. LL/SC can be used instead
> of cmpxchg if the hardware supports LL/SC.
> * If the hardware only supports cmpxchg4 then the sequence field is
> restricted to 2 bytes, which increases the risk of wraparound. If an
> update is delayed for exactly the wraparound interval then the data
> may be corrupted, that is a standard risk of small sequence fields.
> * The extra copy out just to read a structure needs more memory
> bandwidth, plus local allocation for the copy.
> * The internal code of the API to traverse a list containing lockfree
> structures is pretty messy, although that is hidden from the caller.
> * All writers must preserve enough state to redrive their update from
> scratch if a concurrent update has changed the same structure. Note,
> only the curent structure, not the rest of the list.
> * It requires type stable storage. That is, once a data area has been
> allocated to a particular structure type, it always contains that
> structure type, even when it has been freed from the list. Each list
> requires its own free pool, which can never be returned to the OS.

The last of these is particularly lethal.


On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> Nice research topics, but not suitable for day to day work. I only
> used the lockfree algorithms because I could not find an alternative on
> that particular OS. I am not sure that the Linux kernel has any
> problems that require the additional complexity.
> For some light reading, the code I implemented was based on Practical
> Implementations of Non-Blocking Synchronization Primitives by Mark Moir
> http://research.sun.com/scalable/Papers/moir-podc97.ps. Note that page
> 6, line 6 has an error. Replace "y := z" with "y := addr->data[i];".
> I implemented a completely lockfree 2-3-4 tree with embedded pointers
> using Moir's algorithm. The implementation allowed multiple concurrent
> readers and writers. It even allowed concurrent list insert, delete
> and rebalancing, while the structures on the were being read and
> updated.

Thanks. That may come in handy later.

On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> 2-3-4 trees are self balancing, which gave decent lookup performance.
> Since red-black trees are logically equivalent to 2-3-4 trees it should
> be possible to use lockfree red-black trees. However I could not come
> up with a lockfree red-black tree, mainly because an read-black insert
> requires atomic changes to multiple structures. The 2-3-4 tree only
> needs atomic update to one structure at a time.

This actually appears to confirm my earlier assertion about the linkage
of the data structure. Is this conclusion what you had in mind?


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/