Re: [patch] speed up / fix the new generic semaphore code (fixAIM7 40% regression with 2.6.26-rc1)

From: Ingo Molnar
Date: Thu May 08 2008 - 16:20:40 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, 8 May 2008, Linus Torvalds wrote:
> >
> > Why don't we just make it do the same thing that the x86 semaphores used
> > to do: make it signed, and decrement unconditionally. And callt eh
> > slow-path if it became negative.
> > ...
> > and now we have an existing known-good implementation to look at?
>
> Ok, after having thought that over, and looked at the code, I think I
> like your version after all. The old implementation was pretty complex
> due to the need to be so extra careful about the count that could
> change outside of the lock, so everything considered, a new
> implementation that is simpler is probably the better choice.

yeah. I thought about that too, the problem i found is this thing in the
old lib/semaphore-sleepers.c code's __down() path:

remove_wait_queue_locked(&sem->wait, &wait);
wake_up_locked(&sem->wait);
spin_unlock_irqrestore(&sem->wait.lock, flags);
tsk->state = TASK_RUNNING;

that mystery wakeup once i understood to be necessary for some weird
ordering reason, but it would probably be hard to justify in the new
code, because it's done unconditionally, regardless of whether there are
sleepers around.

And once we deviate from the old code, we might as well go for the
simplest approach - which also happens to be rather close to the mutex
code's current slowpath - just with counting property added, legacy
semantics and no lockdep coverage.

> Ergo, I will just pull your scheduler tree.

great! Meanwhile a 100 randconfigs booted fine with that tree so i'd say
the implementation is robust.

i also did a quick re-test of AIM7 because the wakeup logic changed a
bit from what i tested initially (from round-robin to strict FIFO), and
as expected not much changed in the AIM7 results on the quad:

Tasks Jobs/Min JTI Real CPU Jobs/sec/task
2000 55019.9 96 211.6 806.5 0.4585
2000 55116.2 90 211.2 804.7 0.4593
2000 55082.3 82 211.3 805.5 0.4590

this is slightly lower but the test was not fully apples to apples
because this also had some tracing active and other small details. It's
still very close to the v2.6.25 numbers. I suspect some more performance
could be won in this particular workload by getting rid of the BKL
dependency altogether.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/