Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

From: David Woodhouse
Date: Wed Dec 02 2020 - 07:21:38 EST


On Wed, 2020-12-02 at 10:44 +0000, Joao Martins wrote:
> [late response - was on holiday yesterday]
>
> On 12/2/20 12:40 AM, Ankur Arora wrote:
> > On 2020-12-01 5:07 a.m., David Woodhouse wrote:
> > > On Wed, 2019-02-20 at 20:15 +0000, Joao Martins wrote:
> > > > +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
> > > > +{
> > > > + struct shared_info *shared_info;
> > > > + struct page *page;
> > > > +
> > > > + page = gfn_to_page(kvm, gfn);
> > > > + if (is_error_page(page))
> > > > + return -EINVAL;
> > > > +
> > > > + kvm->arch.xen.shinfo_addr = gfn;
> > > > +
> > > > + shared_info = page_to_virt(page);
> > > > + memset(shared_info, 0, sizeof(struct shared_info));
> > > > + kvm->arch.xen.shinfo = shared_info;
> > > > + return 0;
> > > > +}
> > > > +
> > >
> > > Hm.
> > >
> > > How come we get to pin the page and directly dereference it every time,
> > > while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
> > > instead?
> >
> > So looking at my WIP trees from the time, this is something that
> > we went back and forth on as well with using just a pinned page or a
> > persistent kvm_vcpu_map().
> >
> > I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page()
> > as shared_info is created early and is not expected to change during the
> > lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
> > MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
> > kvm_vcpu_unmap() dance or do some kind of synchronization.
> >
> > That said, I don't think this code explicitly disallows any updates
> > to shared_info.
> >
> > >
> > > If that was allowed, wouldn't it have been a much simpler fix for
> > > CVE-2019-3016? What am I missing?
> >
> > Agreed.
> >
> > Perhaps, Paolo can chime in with why KVM never uses pinned page
> > and always prefers to do cached mappings instead?
> >
>
> Part of the CVE fix to not use cached versions.
>
> It's not a longterm pin of the page unlike we try to do here (partly due to the nature
> of the pages we are mapping) but we still we map the gpa, RMW the steal time struct, and
> then unmap the page.
>
> See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: Make sure
> KVM_VCPU_FLUSH_TLB flag is not missed").
>
> But I am not sure it's a good idea to follow the same as record_steal_time() given that
> this is a fairly sensitive code path for event channels.

Right. We definitely need to use atomic RMW operations (like the CVE
fix did) so the page needs to be *mapped*.

My question was about a permanent pinned mapping vs the map/unmap as we
need it that record_steal_time() does.

On IRC, Paolo told me that permanent pinning causes problems for memory
hotplug, and pointed me at the trick we do with an MMU notifier and
kvm_vcpu_reload_apic_access_page().

I'm going to stick with the pinning we have for the moment, and just
fix up the fact that it leaks the pinned pages if the guest sets the
shared_info address more than once.

At some point the apic page MMU notifier thing can be made generic, and
we can use that for this and for KVM steal time too.

Attachment: smime.p7s
Description: S/MIME cryptographic signature