Re: Robust futexes

From: Ingo Molnar
Date: Fri Feb 17 2006 - 04:12:53 EST



* Paul Jackson <pj@xxxxxxx> wrote:

> So the point is that we only have to cleanup the stale locks of dead
> threads when some other task has the misfortune of trying to take the
> orphaned lock and gets forced into a wait.
>
> The wait call essentially becomes a "wait unless said other TID is
> dead, in which case, a new owner is summarily declared."

the fundamental problem i see here: how do you 'declare' a TID as dead?
32-bit TIDs can be reused, quite fundamentally. A 64-bit TID space with
no wrapover was suggested before in this discussion, but that's not
possible for many reasons (ABI impact, user impact, and due to futexes
being designed as 32-bit variables).

also, CPU support is problematic: not all 32-bit CPUs we support can do
64-bit atomic ops. The moment the TID cannot be handled atomically, we
are back to square 1 and to ->list_op_pending type of techniques.

[ and even a 64-bit TID space might be too narrow: lets fast forward 5
years and assume a CPU that can create/destroy a thread in 0.1 usecs
(right now we can do that in ~1 usec), and assume a total number of
2048 cores within the system [say 128x16], a 64-bit TID space, with 3
high bits set aside, could wrap around in 3 years. That's just a
single order of magnitude away from being 'months' and causing
practical problems. And that assumes the most 'compressed' variant: a
central TID counter - not including things like clustering or
scalability enhancements by partitioning the TID space along CPUs. ]

also, this approach has a futex performance issue: at every FUTEX_WAIT
we'd have to look up the TID, just to make sure that it isnt dead. This
will likely be an expensive operation [it probably needs to take the
tasklist_lock, to ensure that the task _really_ isnt dead] - and futexes
are supposed to be lightweight, even in their in-kernel slowpath. While
with the userspace-list based approach, the normal in-kernel futex path
is not impacted _at all_ - and even in the failure case, the TID only
has to be matched against current->pid.

but yes, in theory, if we had a unique ID (per bootup) for every task
started in the system, things would be somewhat simpler in some areas.
In practice though, even if all the other (big) hurdles are overcome,
handling that unique ID likely needs similar techniques as handling the
list, and wont perform as well as the userspace-list based approach.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/