Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delayfactor

From: Rik van Riel
Date: Thu Dec 27 2012 - 14:31:28 EST

Next message: Larry Finger: "[PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked"
Previous message: OGAWA Hirofumi: "Re: [PATCH 4/4] block: Optionally snapshot page contents to provide stable pages during write"
In reply to: Eric Dumazet: "Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delayfactor"
Next in thread: Eric Dumazet: "Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delayfactor"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 12/27/2012 01:49 PM, Eric Dumazet wrote:

On Thu, 2012-12-27 at 09:35 -0500, Rik van Riel wrote:

The lock acquisition time depends on the holder of the lock,
and what the CPUs ahead of us in line will do with the lock,
not on the caller IP of the spinner.

That would be true only for general cases.

In network land, we do have spinlock acquisition time depending on the
context.

A garbage collector usually runs for longer time than the regular fast
path.

Won't the garbage collector running, hold up the lock
acquisition time by OTHER acquirers?

But even without gc, its pretty often we have consumer/producers that
don't have the same amount of work to perform per lock/unlock sections.

The socket lock per example, might be held for very small sections for
process contexts (lock_sock() / release_sock()), but longer sections
from softirq context. Of course, severe lock contention on a socket
seems unlikely in real workloads.

If one actor holds the lock for longer than the
others, surely it would be the others that suffer
in lock acquisition time?

Therefore, I am not convinced that hashing on the caller IP
will add much, if anything, except increasing the chance
that we end up not backing off when we should...

IMHO it would be good to try keeping this solution as simple
as we can get away with.

unsigned long hash = (unsigned long)lock ^
(unsigned long)__builtin_return_address(1);

seems simple enough to me, but I get your point.

I also recorded the max 'delay' value reached on my machine to check how
good MAX_SPINLOCK_DELAY value was :

[ 89.628265] cpu 16 delay 3710
[ 89.631230] cpu 6 delay 2930
[ 89.634120] cpu 15 delay 3186
[ 89.637092] cpu 18 delay 3789
[ 89.640071] cpu 22 delay 4012
[ 89.643080] cpu 11 delay 3389
[ 89.646057] cpu 21 delay 3123
[ 89.649035] cpu 9 delay 3295
[ 89.651931] cpu 3 delay 3063
[ 89.654811] cpu 14 delay 3335

Although it makes no performance difference to use a bigger/smaller one.

I guess we want a larger value.

With your hashed lock approach, we can get away with
larger values - they will not penalize other locks
the same way a single value per cpu might have.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Larry Finger: "[PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked"
Previous message: OGAWA Hirofumi: "Re: [PATCH 4/4] block: Optionally snapshot page contents to provide stable pages during write"
In reply to: Eric Dumazet: "Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delayfactor"
Next in thread: Eric Dumazet: "Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delayfactor"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]