Re: [PATCH] Lightweight userspace semaphores...

From: Hubertus Franke (frankeh@watson.ibm.com)
Date: Mon Mar 04 2002 - 11:51:43 EST


On Mon, Mar 04, 2002 at 12:30:40AM +1100, Rusty Russell wrote:
> On Sat, 2 Mar 2002 09:50:31 -0500
> Hubertus Franke <frankeh@watson.ibm.com> wrote:
>

I am pretty much ready to support Rusty's latest patch. I think
much of the stuff I mentioned as been addressed.
The hash collision elimination the biggest one.
The page pinning is a great idea and I see that to be a great
advantage over what I tried to push.

> > On Wed, Feb 27, 2002 at 11:24:17AM +1100, Rusty Russell wrote:
> > > - We can no longer do generic semaphores: mutexes only.
> > > - Requires arch __update_count atomic op.
> >
> > I don't see that, you can use atomic_inc on 386s ..
>
> __update_count needed to be able to do the decrement from 0 or
> the count, whichever was greater. I have eliminated this in the
> new patch (patch III).
>
> > (a) the hashing is to simple. As a consequence there are two limitations
> > - the total number of locks that can be waited for is
> > (1<< USEM_HASHBITS).
> > that is way not enough (as you pointed out).
> > - this has to be bumped up to roughly the number of tasks in the
> > system.
>
> OK. We dealt with hash collisions by sharing waitqueues. This works, but
> means we have to do wake-all. My new patch uses its own queues: we still
> share queues, but we only wake the first one waiting on our counter.
>
> > (c) your implementation could be easily expanded to include Ben's
> > suggestion of multiple queues (which I already implement in my
> > submission), by including the queue index into the hashing mechanism.
>
> Hmmm.... it should be pretty easy to extend: the queues can be shared.
>

Correct, should we bring that into the interface already.
Expand the syscall with an additional parameter identifying the queue.

> > (f) Why get away from semaphores and only stick to mutexes. Isn't there
> > some value to provide counting semaphores ? Effectively, mutexes are
> > a subclass of semaphores with a max-count of 1.
>
> Yes, but counting semaphores need two variables, or cmpxchg. Mutexes do not
> need either (proof by implementation 8). While general counting semaphores
> may be useful, the fact that they are not implemented in the kernel suggests
> they are not worth paying extra for.

Not exactly, they need one atomic in user space and a semaphore in the kernel
which as you stated uses (atomic and sleepers).

>
> > (g) I don't understand the business of rechecking in the kernel. In your
> > case it comes cheap as you pinned the page already. In general that's
> > true, since the page was just touched the pinning itself is reasonable
> > cheap. Nevertheless, I am concerned that now you need intrinsic knowledge
> > how the lock word is used in user space. For instance, I use it in
> > different fashions, dependent whether its a semaphore or rwlock.
>
> This is a definite advantage: my old fd-based code never looked at the
> userspace counter: it had a kernel sempahore to sleep on for each
> userspace lock. OTOH, this involves kernel allocation, with the complexity
> that brings.
>

???, kernel allocation is only required in under contention. If you take a look
at how I used the atomic word for semaphores and for rwlock_t, its different.
If you recheck in the kernel you need to know how this is used.
In my approach I simply allocated two queues on demand.

I am OK with only supplying semaphores and rwlocks, if there is a general consent
about it. Nevertheless, I think the multiqueue approach is somewhat more elegant
and expandable.

> > (h) how do you deal with signals ? E.g. SIGIO or something like it.
>
> They're interruptible, so you'll get -ERESTARTSYS. Should be fine.
>

That's what I did too, but some folks pointed out they might want to
interrupt a waking task, so that it can get back into user space and
fail with EAGAIN or so and do not automatically restart the syscall.

> > Overall, pinning down the page (assuming that not a lot of tasks are
> > waiting on a large number of queues at the same time) is acceptable,
> > and it cuts out one additional level of indirection that is present
> > in my submission.
>
> I agree. While originally reluctant, when it's one page per sleeping
> process it's rather neat.
>

Yipp, most applications that will take advantage of this will allocate
the locks anyway in a shared region, i.e. there will be more than
one lock per page.

> Cheers!
> Rusty.
> --

As I said above, I am favor for the general infrastructure that you have in
place now. Can we hash out the rwlock issues again.
They can still be folded into a single queue (logical) and you wakeup
eiher the next writer or set of readers, or all readers ....
We need a mean to bring that through the API. I think about it.

Great job Rusty, pulling all these issues together with a comprehensive patch.
The few remaining issues are cravy :-)

> Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

-- Hubertus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 07 2002 - 21:00:34 EST