Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delayfactor

From: Rik van Riel
Date: Fri Dec 21 2012 - 21:54:07 EST

Next message: Eric Wong: "Re: epoll with ONESHOT possibly fails to deliver events"
Previous message: Rik van Riel: "Re: [RFC PATCH 3/3] x86,smp: auto tune spinlock backoff delay factor"
In reply to: Eric Dumazet: "Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delayfactor"
Next in thread: Eric Dumazet: "Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delayfactor"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 12/21/2012 07:48 PM, Eric Dumazet wrote:

On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:
Argh, the first one had a typo in it that did not influence
performance with fewer threads running, but that made things
worse with more than a dozen threads...

Please let me know if you can break these patches.
---8<---
Subject: x86,smp: auto tune spinlock backoff delay factor

+#define MIN_SPINLOCK_DELAY 1
+#define MAX_SPINLOCK_DELAY 1000
+DEFINE_PER_CPU(int, spinlock_delay) = { MIN_SPINLOCK_DELAY };

Using a single spinlock_delay per cpu assumes there is a single
contended spinlock on the machine, or that contended
spinlocks protect the same critical section.

The goal is to reduce bus traffic, and keep total
system performance from falling through the floor.

If we have one lock that takes N cycles to acquire,
and a second contended lock that takes N*2 cycles
to acquire, checking the first lock fewer times
before acquisition, and the second lock more times,
should still result in similar average system
throughput.

I suspect this approach should work well if we have
multiple contended locks in the system.

Given that we probably know where the contended spinlocks are, couldnt
we use a real scalable implementation for them ?

The scalable locks tend to have a slightly more
complex locking API, resulting in a slightly
higher overhead in the non-contended (normal)
case. That means we cannot use them everywhere.

Also, scalable locks merely make sure that N+1
CPUs perform the same as N CPUs when there is
lock contention. They do not cause the system
to actually scale.

For actual scalability, the data structure would
need to be changed, so locking requirements are
better.

A known contended one is the Qdisc lock in network layer. We added a
second lock (busylock) to lower a bit the pressure on a separate cache
line, but a scalable lock would be much better...

My locking patches are meant for dealing with the
offenders we do not know about, to make sure that
system performance does not fall off a cliff when
we run into a surprise.

Known scalability bugs we can fix.

Unknown ones should not cause somebody's system
to fail.

I guess there are patent issues...

At least one of the scalable lock implementations has been
known since 1991, so there should not be any patent issues
with that one.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Eric Wong: "Re: epoll with ONESHOT possibly fails to deliver events"
Previous message: Rik van Riel: "Re: [RFC PATCH 3/3] x86,smp: auto tune spinlock backoff delay factor"
In reply to: Eric Dumazet: "Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delayfactor"
Next in thread: Eric Dumazet: "Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delayfactor"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]