Re: [PATCH 1/2] KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved

From: Sean Christopherson
Date: Tue Nov 12 2019 - 11:57:20 EST

Next message: Jerome Brunet: "Re: [PATCH v2 3/3] clk: meson: a1: add support for Amlogic A1 clock driver"
Previous message: Javier F. Arias: "[PATCH 9/9] staging: rtl8723bs: Rename variable"
In reply to: Paolo Bonzini: "Re: [PATCH 1/2] KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Nov 12, 2019 at 11:19:44AM +0100, Paolo Bonzini wrote:
> On 12/11/19 01:51, Dan Williams wrote:
> > An elevated page reference count for file mapped pages causes the
> > filesystem (for a dax mode file) to wait for that reference count to
> > drop to 1 before allowing the truncate to proceed. For a page cache
> > backed file mapping (non-dax) the reference count is not considered in
> > the truncate path. It does prevent the page from getting freed in the
> > page cache case, but the association to the file is lost for truncate.
>
> KVM support for file-backed guest memory is limited. It is not
> completely broken, in fact cases such as hugetlbfs are in use routinely,
> but corner cases such as truncate aren't covered well indeed.

KVM's actual MMU should be ok since it coordinates with the mmu_notifier.

kvm_vcpu_map() is where KVM could run afoul of page cache truncation.
This is the other main use of hva_to_pfn*(), where KVM directly accesses
guest memory (which could be file-backed) without coordinating with the
mmu_notifier. IIUC, an ill-timed page cache truncation could result in a
write from KVM effectively being dropped due to writeback racing with
KVM's write to the page. If that's true, then I think KVM would need to
to move to the proposed pin_user_pages() to ensure its "DMA" isn't lost.

> > As long as any memory the guest expects to be persistent is backed by
> > mmu-notifier coordination we're all good, otherwise an elevated
> > reference count does not coordinate with truncate in a reliable way.

KVM itself is (mostly) blissfully unaware of any such expectations. The
userspace VMM, e.g. Qemu, is ultimately responsible for ensuring the guest
sees a valid model, e.g. that persistent memory (as presented to the guest)
is actually persistent (from the guest's perspective).

The big caveat is the truncation issue above.

Next message: Jerome Brunet: "Re: [PATCH v2 3/3] clk: meson: a1: add support for Amlogic A1 clock driver"
Previous message: Javier F. Arias: "[PATCH 9/9] staging: rtl8723bs: Rename variable"
In reply to: Paolo Bonzini: "Re: [PATCH 1/2] KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]