Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

From: Sean Christopherson
Date: Tue Mar 09 2021 - 14:43:44 EST


On Mon, Mar 08, 2021, Steve Rutherford wrote:
> On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh <brijesh.singh@xxxxxxx> wrote:
> > On 3/8/21 1:51 PM, Sean Christopherson wrote:
> > > If the guest does the hypercall after writing the page, then the guest is hosed
> > > if it gets migrated while writing the page (scenario #1):
> > >
> > > vCPU Userspace
> > > zero_bytes[0:N]
> > > <transfers written bytes as private instead of shared>
> > > <migrates vCPU>
> > > zero_bytes[N+1:4095]
> > > set_shared (dest)
> > > kaboom!
> >
> >
> > Maybe I am missing something, this is not any different from a normal
> > operation inside a guest. Making a page shared/private in the page table
> > does not update the content of the page itself. In your above case, I
> > assume zero_bytes[N+1:4095] are written by the destination VM. The
> > memory region was private in the source VM page table, so, those writes
> > will be performed encrypted. The destination VM later changed the memory
> > to shared, but nobody wrote to the memory after it has been transitioned
> > to the shared, so a reader of the memory should get ciphertext and
> > unless there was a write after the set_shared (dest).

Sorry, that wasn't clear, there's an implied page table update to make the page
shared before zero_bytes.

> > > If userspace does GET_DIRTY after GET_LIST, then the host would transfer bad
> > > data by consuming a stale list (scenario #2):
> > >
> > > vCPU Userspace
> > > get_list (from KVM or internally)
> > > set_shared (src)
> > > zero_page (src)
> > > get_dirty
> > > <transfers private data instead of shared>
> > > <migrates vCPU>
> > > kaboom!
> >
> >
> > I don't remember how things are done in recent Ashish Qemu/KVM patches
> > but in previous series, the get_dirty() happens before the querying the
> > encrypted state. There was some logic in VMM to resync the encrypted
> > bitmap during the final migration stage and perform any additional data
> > transfer since last sync.

It's likely that Ashish's patches did the correct thing, I just wanted to point
out that both host and guest need to do everything in a very specific order.

> > > If both guest and host order things to avoid #1 and #2, the host can still
> > > migrate the wrong data (scenario #3):
> > >
> > > vCPU Userspace
> > > set_private
> > > zero_bytes[0:4096]
> > > get_dirty
> > > set_shared (src)
> > > get_list
> > > <transfers as shared instead of private>
> > > <migrates vCPU>
> > > set_private (dest)
> > > kaboom!
> >
> >
> > Since there was no write to the memory after the set_shared (src), so
> > the content of the page should not have changed. After the set_private
> > (dest), the caller should be seeing the same content written by the
> > zero_bytes[0:4096]
>
> I think Sean was going for the situation where the VM has moved to the
> destination, which would have changed the VEK. That way the guest
> would be decrypting the old ciphertext with the new (wrong) key.

I think that's what I was saying?

I was pointing out that the userspace VMM would see the page as "shared" and so
read the guest memory directly instead of routing it through the PSP.

Anyways, my intent wasn't to point out a known issue anywhere. I was working
through how GET_LIST would interact with GET_DIRTY_LOG to convince myself that
the various flows were bombproof. I wanted to record those thoughts so that I
can remind myself of the requirements when I inevitably forget them in the future.

> > > Scenario #3 is unlikely, but plausible, e.g. if the guest bails from its
> > > conversion flow for whatever reason, after making the initial hypercall. Maybe
> > > it goes without saying, but to address #3, the guest must consider existing data
> > > as lost the instant it tells the host the page has been converted to a different
> > > type.
> > >
> > >> For the above reason if we do in-kernel hypercall handling for page
> > >> encryption status (which we probably won't require for SEV-SNP &
> > >> correspondingly there will be no hypercall exiting),
> > > As above, that doesn't preclude KVM from exiting to userspace on conversion.
> > >
> > >> then we can implement a standard GET/SET ioctl interface to get/set the guest
> > >> page encryption status for userspace, which will work across SEV, SEV-ES and
> > >> SEV-SNP.