Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support forlinux guests running on KVM hypervisor

From: Gleb Natapov
Date: Wed Jul 31 2013 - 02:25:56 EST


On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
> On 07/25/2013 03:08 PM, Raghavendra K T wrote:
> >On 07/25/2013 02:45 PM, Gleb Natapov wrote:
> >>On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
> >>>On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> >>>>On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> >>>>>On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> >>>>>>On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >>>>>>>On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>>>>>>>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>>>>>>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>>>>>>>+static void kvm_lock_spinning(struct arch_spinlock *lock,
> >>>>>>>>>>__ticket_t want)
> >>>>>>>>[...]
> >>>>>>>>>>+
> >>>>>>>>>>+ /*
> >>>>>>>>>>+ * halt until it's our turn and kicked. Note that we do safe
> >>>>>>>>>>halt
> >>>>>>>>>>+ * for irq enabled case to avoid hang when lock info is
> >>>>>>>>>>overwritten
> >>>>>>>>>>+ * in irq spinlock slowpath and no spurious interrupt occur
> >>>>>>>>>>to save us.
> >>>>>>>>>>+ */
> >>>>>>>>>>+ if (arch_irqs_disabled_flags(flags))
> >>>>>>>>>>+ halt();
> >>>>>>>>>>+ else
> >>>>>>>>>>+ safe_halt();
> >>>>>>>>>>+
> >>>>>>>>>>+out:
> >>>>>>>>>So here now interrupts can be either disabled or enabled. Previous
> >>>>>>>>>version disabled interrupts here, so are we sure it is safe to
> >>>>>>>>>have them
> >>>>>>>>>enabled at this point? I do not see any problem yet, will keep
> >>>>>>>>>thinking.
> >>>>>>>>
> >>>>>>>>If we enable interrupt here, then
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>+ cpumask_clear_cpu(cpu, &waiting_cpus);
> >>>>>>>>
> >>>>>>>>and if we start serving lock for an interrupt that came here,
> >>>>>>>>cpumask clear and w->lock=null may not happen atomically.
> >>>>>>>>if irq spinlock does not take slow path we would have non null
> >>>>>>>>value
> >>>>>>>>for lock, but with no information in waitingcpu.
> >>>>>>>>
> >>>>>>>>I am still thinking what would be problem with that.
> >>>>>>>>
> >>>>>>>Exactly, for kicker waiting_cpus and w->lock updates are
> >>>>>>>non atomic anyway.
> >>>>>>>
> >>>>>>>>>>+ w->lock = NULL;
> >>>>>>>>>>+ local_irq_restore(flags);
> >>>>>>>>>>+ spin_time_accum_blocked(start);
> >>>>>>>>>>+}
> >>>>>>>>>>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>>>>>>>>>+
> >>>>>>>>>>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>>>>>>>>>+static void kvm_unlock_kick(struct arch_spinlock *lock,
> >>>>>>>>>>__ticket_t ticket)
> >>>>>>>>>>+{
> >>>>>>>>>>+ int cpu;
> >>>>>>>>>>+
> >>>>>>>>>>+ add_stats(RELEASED_SLOW, 1);
> >>>>>>>>>>+ for_each_cpu(cpu, &waiting_cpus) {
> >>>>>>>>>>+ const struct kvm_lock_waiting *w =
> >>>>>>>>>>&per_cpu(lock_waiting, cpu);
> >>>>>>>>>>+ if (ACCESS_ONCE(w->lock) == lock &&
> >>>>>>>>>>+ ACCESS_ONCE(w->want) == ticket) {
> >>>>>>>>>>+ add_stats(RELEASED_SLOW_KICKED, 1);
> >>>>>>>>>>+ kvm_kick_cpu(cpu);
> >>>>>>>>>What about using NMI to wake sleepers? I think it was
> >>>>>>>>>discussed, but
> >>>>>>>>>forgot why it was dismissed.
> >>>>>>>>
> >>>>>>>>I think I have missed that discussion. 'll go back and check. so
> >>>>>>>>what is the idea here? we can easily wake up the halted vcpus that
> >>>>>>>>have interrupt disabled?
> >>>>>>>We can of course. IIRC the objection was that NMI handling path
> >>>>>>>is very
> >>>>>>>fragile and handling NMI on each wakeup will be more expensive then
> >>>>>>>waking up a guest without injecting an event, but it is still
> >>>>>>>interesting
> >>>>>>>to see the numbers.
> >>>>>>>
> >>>>>>
> >>>>>>Haam, now I remember, We had tried request based mechanism. (new
> >>>>>>request like REQ_UNHALT) and process that. It had worked, but had
> >>>>>>some
> >>>>>>complex hacks in vcpu_enter_guest to avoid guest hang in case of
> >>>>>>request cleared. So had left it there..
> >>>>>>
> >>>>>>https://lkml.org/lkml/2012/4/30/67
> >>>>>>
> >>>>>>But I do not remember performance impact though.
> >>>>>No, this is something different. Wakeup with NMI does not need KVM
> >>>>>changes at
> >>>>>all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
> >>>>>
> >>>>
> >>>>True. It was not NMI.
> >>>>just to confirm, are you talking about something like this to be
> >>>>tried ?
> >>>>
> >>>>apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
> >>>
> >>>When I started benchmark, I started seeing
> >>>"Dazed and confused, but trying to continue" from unknown nmi error
> >>>handling.
> >>>Did I miss anything (because we did not register any NMI handler)? or
> >>>is it that spurious NMIs are trouble because we could get spurious NMIs
> >>>if next waiter already acquired the lock.
> >>There is a default NMI handler that tries to detect the reason why NMI
> >>happened (which is no so easy on x86) and prints this message if it
> >>fails. You need to add logic to detect spinlock slow path there. Check
> >>bit in waiting_cpus for instance.
> >
> >aha.. Okay. will check that.
>
> yes. Thanks.. that did the trick.
>
> I did like below in unknown_nmi_error():
> if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
> return;
>
> But I believe you asked NMI method only for experimental purpose to
> check the upperbound. because as I doubted above, for spurious NMI
> (i.e. when unlocker kicks when waiter already got the lock), we would
> still hit unknown NMI error.
>
> I had hit spurious NMI over 1656 times over entire benchmark run.
> along with
> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
> long to run: 24.886 msecs etc...
>
I wonder why this happens.

> (and we cannot get away with that too because it means we bypass the
> unknown NMI error even in genuine cases too)
>
> Here was the result for the my dbench test( 32 core machine with 32
> vcpu guest HT off)
>
> ---------- % improvement --------------
> pvspinlock pvspin_ipi pvpsin_nmi
> dbench_1x 0.9016 0.7442 0.7522
> dbench_2x 14.7513 18.0164 15.9421
> dbench_3x 14.7571 17.0793 13.3572
> dbench_4x 6.3625 8.7897 5.3800
>
> So I am seeing over 2-4% improvement with IPI method.
>
Yeah, this was expected.

> Gleb,
> do you think the current series looks good to you? [one patch I
> have resent with in_nmi() check] or do you think I have to respin the
> series with IPI method etc. or is there any concerns that I have to
> address. Please let me know..
>
The current code looks fine to me.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/