Re: prepare_wait / finish_wait question

From: Andrew Morton
Date: Sun Nov 09 2003 - 05:17:21 EST


Manfred Spraul <manfred@xxxxxxxxxxxxxxxx> wrote:
>
> Hi Ingo,
>
> sysv semaphores show the same problem you've fixed for wait queue with
> finish_wait:

Was me, actually.

> one thread wakes up a blocked thread and must hold a spinlock for the
> wakeup. The blocked thread immediately tries to acquire that spinlock,
> because it must figure out what happened. Result: noticable cache line
> trashing on an 4xXeon with postgres.
> autoremove_wake_function first calls wake_up, then list_del_init. Did
> you test that the woken up thread is not too fast and acquires the
> spinlock before list_del_init had a chance to reset the list?

No, I didn't instrument it. But profiling showed that it was working as
desired. The workload was tons of disk I/O, showing significant CPU time
in the page lock/unlock functions.

It would be neater to remove the task from the list _before_ waking it up.
The current code in there is careful to only remove the task if the wakeup
attempt was successful, but I have a feeling that this is unnecessary - the
waiting task will do the right thing. One would need to think about that a
bit more.

> I wrote a patch for sysv sem and on a 4x Pentium 3, 99.9% of the calls
> hit the fast path, but I'm a bit afraid that monitor/mwait could be so
> fast that the fast path is not chosen.

Is it not the case that ia32's reschedule IPI is async? If the
architecture's reschedule uses a synchronous IPI then it could indeed be
the case that the woken CPU gets there first.

> I'm thinking about a two-stage algorithm - what's your opinion?

Instrumentation on other architectures would be interesting.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/