[PATCH] use unfair spinlock when running on hypervisor.

From: Gleb Natapov
Date: Tue Jun 01 2010 - 05:35:49 EST

Next message: Tejun Heo: "[PATCH 3/3] vhost: apply cpumask and cgroup to vhost workers"
Previous message: Tejun Heo: "[PATCH 2/3] cgroups: Add an API to attach a task to current task'scgroup"
Next in thread: Andi Kleen: "Re: [PATCH] use unfair spinlock when running on hypervisor."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Ticket lock spinlock ensures fairness by introducing FIFO of cpus waiting
for spinlock to be released. This works great on real HW, but when running
on hypervisor it introduce very heavy performance hit if physical cpus
are overcommitted (up to 35% in my test). The reason for performance
hit is that vcpu that at the head of the FIFO can be unscheduled and no
other vcpu waiting on the lock can proceed. This result in a big time
gaps where no vcpu is in critical section. Even worse they are constantly
spinning and not allowing vcpu at the head of the FIFO to be scheduled
to run.

The patch below allows to patch ticket spinlock code to behave similar to
old unfair spinlock when hypervisor is detected. After patching unlocked
condition remains the same (next == owner), but if vcpu detects that lock
is already taken it spins till (next == owner == 0) and when the condition
is met it tries to reacquire the lock. Unlock simply zeroes head and tail.
Trylock state the same since unlocked condition stays the same.

I ran two guest with 16 vcpus each one. One guest with this patch another
without. Inside those guests I ran kernel compilation "time make -j 16 all".
Here are results of two runs:

patched: unpatched:
real 13m34.674s real 19m35.827s
user 96m2.638s user 102m38.665s
sys 56m14.991s sys 158m22.470s

real 13m3.768s real 19m4.375s
user 95m34.509s user 111m9.903s
sys 53m40.550s sys 141m59.370s

Note that for 6 minutes unpatched guest ran without cpu oversubscription
since patched guest was already idle.

Running kernbench on the host with and without the patch:

Unpatched: Patched:
Average Half load -j 8 Run (std deviation):
Elapsed Time 312.538 (0.779404) Elapsed Time 311.974 (0.258031)
User Time 2154.89 (6.62261) User Time 2151.94 (1.96702)
System Time 233.78 (1.96642) System Time 236.472 (0.695572)
Percent CPU 763.8 (1.64317) Percent CPU 765 (0.707107)
Context Switches 39348.2 (1542.87) Context Switches 41522.4 (1193.13)
Sleeps 871688 (5596.52) Sleeps 877317 (1135.83)

Average Optimal load -j 16 Run (std deviation):
Elapsed Time 259.878 (0.444826) Elapsed Time 258.842 (0.413122)
User Time 2549.22 (415.685) User Time 2538.42 (407.383)
System Time 261.084 (28.8145) System Time 263.847 (28.8638)
Percent CPU 1003.4 (252.566) Percent CPU 1003.5 (251.407)
Context Switches 228418 (199308) Context Switches 228894 (197533)
Sleeps 898721 (28757.2) Sleeps 902020 (26074.5)

Signed-off-by: Gleb Natapov <gleb@xxxxxxxxxx>
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..b919b54 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -60,19 +60,27 @@

static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
- short inc = 0x0100;
+ short inc;

asm volatile (
+ "1:\t\n"
+ "mov $0x100, %0\n\t"
LOCK_PREFIX "xaddw %w0, %1\n"
- "1:\t"
+ "2:\t"
"cmpb %h0, %b0\n\t"
- "je 2f\n\t"
+ "je 4f\n\t"
+ "3:\t\n"
"rep ; nop\n\t"
- "movb %1, %b0\n\t"
/* don't need lfence here, because loads are in-order */
- "jmp 1b\n"
- "2:"
- : "+Q" (inc), "+m" (lock->slock)
+ ALTERNATIVE(
+ "movb %1, %b0\n\t"
+ "jmp 2b\n",
+ "nop", X86_FEATURE_HYPERVISOR)"\n\t"
+ "cmpw $0, %1\n\t"
+ "jne 3b\n\t"
+ "jmp 1b\n\t"
+ "4:"
+ : "=Q" (inc), "+m" (lock->slock)
:
: "memory", "cc");
}
@@ -98,10 +106,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)

static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
{
- asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
- : "+m" (lock->slock)
- :
- : "memory", "cc");
+ asm volatile(
+ ALTERNATIVE(UNLOCK_LOCK_PREFIX "incb (%0);"ASM_NOP3,
+ UNLOCK_LOCK_PREFIX "movw $0, (%0)",
+ X86_FEATURE_HYPERVISOR)
+ :
+ : "Q" (&lock->slock)
+ : "memory", "cc");
}
#else
#define TICKET_SHIFT 16
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Tejun Heo: "[PATCH 3/3] vhost: apply cpumask and cgroup to vhost workers"
Previous message: Tejun Heo: "[PATCH 2/3] cgroups: Add an API to attach a task to current task'scgroup"
Next in thread: Andi Kleen: "Re: [PATCH] use unfair spinlock when running on hypervisor."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]