Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

From: Xiao Guangrong
Date: Fri Apr 27 2012 - 01:53:17 EST


On 04/27/2012 07:45 AM, Marcelo Tosatti wrote:


>> +static bool
>> +fast_pf_fix_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>> + u64 *sptep, u64 spte)
>> +{
>> + gfn_t gfn;
>> +
>> + spin_lock(&vcpu->kvm->mmu_lock);
>> +
>> + /* The spte has been changed. */
>> + if (*sptep != spte)
>> + goto exit;
>> +
>> + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
>> +
>> + *sptep = spte | PT_WRITABLE_MASK;
>> + mark_page_dirty(vcpu->kvm, gfn);
>> +
>> +exit:
>> + spin_unlock(&vcpu->kvm->mmu_lock);
>
> There was a misunderstanding.


Sorry, Marcelo! It is still not clear for me now. :(

> The suggestion is to change codepaths that
> today assume that a side effect of holding mmu_lock is that sptes are
> not updated or read before attempting to introduce any lockless spte
> updates.
>
> example)
>
> u64 mask, old_spte = *sptep;
>
> if (!spte_has_volatile_bits(old_spte) || (new_spte & mask) == mask)
> __update_clear_spte_fast(sptep, new_spte);
> else
> old_spte = __update_clear_spte_slow(sptep, new_spte);
>
> The local old_spte copy might be stale by the
> time "spte_has_volatile_bits(old_spte)" reads the writable bit.
>
> example)
>
>
> VCPU0 VCPU1
> set_spte
> mmu_spte_update decides fast write
> mov newspte to sptep
> (non-locked write instruction)
> newspte in store buffer
>
> lockless spte read path reads stale value
>
> spin_unlock(mmu_lock)
>
> Depending on the what stale value CPU1 read and what decision it took
> this can be a problem, say the new bits (and we do not want to verify
> every single case). The spte write from inside mmu_lock should always be
> atomic now?
>
> So these details must be exposed to the reader, they are hidden now
> (note mmu_spte_update is already a maze, its getting too deep).
>


Actually, in this patch, all the spte update is under mmu-lock, and we
lockless-ly read spte , but the spte will be verified again after holding
mmu-lock.

+ spin_lock(&vcpu->kvm->mmu_lock);
+
+ /* The spte has been changed. */
+ if (*sptep != spte)
+ goto exit;
+
+ gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+
+ *sptep = spte | PT_WRITABLE_MASK;
+ mark_page_dirty(vcpu->kvm, gfn);
+
+exit:
+ spin_unlock(&vcpu->kvm->mmu_lock);

Is not the same as both read/update spte under mmu-lock?

Hmm, this is what you want?

+static bool
+fast_pf_fix_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+ u64 *sptep, u64 spte)
+{
+ gfn_t gfn;
+
+ gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+ mmu_spte_update(sptep, spte | PT_WRITABLE_MASK);
+ mark_page_dirty(vcpu->kvm, gfn);
+
+ return true;
+}
+
+/*
+ * Return value:
+ * - true: let the vcpu to access on the same address again.
+ * - false: let the real page fault path to fix it.
+ */
+static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+ int level, u32 error_code)
+{
+ struct kvm_shadow_walk_iterator iterator;
+ struct kvm_mmu_page *sp;
+ bool ret = false;
+ u64 spte = 0ull;
+
+ if (!page_fault_can_be_fast(vcpu, gfn, error_code))
+ return false;
+
+ spin_lock(&vcpu->kvm->mmu_lock);
+
+ for_each_shadow_entry(vcpu, gva, iterator) {
+ spte = *iterator.sptep;
+
+ if (!is_shadow_present_pte(spte) || iterator.level < level)
+ break;
+ }
+
+ /*
+ * If the mapping has been changed, let the vcpu fault on the
+ * same address again.
+ */
+ if (!is_rmap_spte(spte)) {
+ ret = true;
+ goto exit;
+ }
+
+ if (!is_last_spte(spte, level))
+ goto exit;
+
+ /*
+ * Check if it is a spurious fault caused by TLB lazily flushed.
+ *
+ * Need not check the access of upper level table entries since
+ * they are always ACC_ALL.
+ */
+ if (is_writable_pte(spte)) {
+ ret = true;
+ goto exit;
+ }
+
+ /*
+ * Currently, to simplify the code, only the spte write-protected
+ * by dirty-log can be fast fixed.
+ */
+ if (!spte_wp_by_dirty_log(spte))
+ goto exit;
+
+ sp = page_header(__pa(iterator.sptep));
+
+ ret = fast_pf_fix_spte(vcpu, sp, iterator.sptep, spte);
+
+exit:
+ spin_unlock(&vcpu->kvm->mmu_lock);
+
+ return ret;
+}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/