Re: x86/sgx: uapi change proposal

From: Andy Lutomirski
Date: Tue Jan 08 2019 - 17:54:29 EST


On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson
<sean.j.christopherson@xxxxxxxxx> wrote:
>
> On Tue, Jan 08, 2019 at 11:27:11AM -0800, Huang, Kai wrote:
> > > >
> > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > opening a new instance of /dev/sgx for each encalve?
> > >
> > > Directly associating /dev/sgx with an enclave means /dev/sgx can't be used
> > > to provide ioctl()'s for other SGX-related needs, e.g. to mmap() raw EPC and
> > > expose it a VM. Proposed layout in the link below. I'll also respond to
> > > Jarkko's question about exposing EPC through /dev/sgx instead of having
> > > KVM allocate it on behalf of the VM.
> > >
> > > https://lkml.kernel.org/r/20181218185349.GC30082@xxxxxxxxxxxxxxx
> >
> > Hi Sean,
> >
> > Sorry for replying to old email. But IMHO it is not a must that Qemu
> > needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual
> > EPC slot, instead, KVM could create private slot, which is not visible
> > to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation
> > API directly.
>
> That's possible, but it has several downsides.
>
> - Duplicates a lot of code in KVM for managing memory regions.
> - Artificially restricts userspace to a single EPC region, unless
> even more code is duplicated to handle multiple private regions.
> - Requires additional ioctls() or capabilities to probe EPC support
> - Does not fit with Qemu/KVM's memory model, e.g. all other types of
> memory are exposed to a guest through KVM_SET_USER_MEMORY_REGION.
> - Prevents userspace from debugging a guest's enclave. I'm not saying
> this is a likely scenario, but I also don't think we should preclude
> it without good reason.
> - KVM is now responsible for managing the lifecycle of EPC, e.g. what
> happens if an EPC cgroup limit is lowered on a running VM and
> KVM can't gracefully reclaim EPC? The userspace hypervisor should
> ultimately decide how to handle such an event.
> - SGX logic is split between SGX and KVM, e.g. VA page management for
> oversubscription will likely be common to SGX and KVM. From a long
> term maintenance perspective, this means that changes to the EPC
> management could potentially need to be Acked by KVM, and vice versa.
>
> > I am not sure what's the good of allowing userspace to alloc/mmap a
> > raw EPC region? Userspace is not allowed to touch EPC anyway, expect
> > enclave code.
> >
> > To me KVM creates private EPC slot is cleaner than exposing /dev/sgx/epc
> > and allowing userspace to map some raw EPC region.
>
> Cleaner in the sense that it's faster to get basic support up and running
> since there are fewer touchpoints, but there are long term ramifications
> to cramming EPC management in KVM.
>
> And at this point I'm not stating any absolutes, e.g. how EPC will be
> handled by KVM. What I'm pushing for is to not eliminate the possibility
> of having the SGX subsystem own all EPC management, e.g. don't tie
> /dev/sgx to a single enclave.

I haven't gone and re-read all the relevant SDM bits, so I'll just
ask: what, if anything, are the actual semantics of mapping "raw EPC"
like this? You can't actually do anything with the mapping from user
mode unless you actually get an enclave created and initialized in it
and have it mapped at the correct linear address, right? I still
think you have the right idea, but it is a bit unusual.

I do think it makes sense to have QEMU delegate the various ENCLS
operations (especially EINIT) to the regular SGX interface, which will
mean that VM guests will have exactly the same access controls applied
as regular user programs, which is probably what we want. If so,
there will need to be a way to get INITTOKEN privilege for the purpose
of running non-Linux OSes in the VM, which isn't the end of the world.
We might still want the actual ioctl to do EINIT using an actual
explicit token to be somehow restricted in a way that strongly
discourages its use by anything other than a hypervisor. Or I suppose
we could just straight-up ignore the guest-provided init token.

--Andy

P.S. Is Intel ever going to consider a way to make guests get their
own set of keys that are different from the host's keys and other
guests' keys?