Re: [PATCH v3 07/18] nitro_enclaves: Init misc device providing the ioctl interface

From: Greg KH
Date: Thu May 28 2020 - 09:13:08 EST


On Thu, May 28, 2020 at 03:01:36PM +0200, Alexander Graf wrote:
>
>
> On 27.05.20 00:24, Greg KH wrote:
> >
> > On Tue, May 26, 2020 at 03:44:30PM +0200, Alexander Graf wrote:
> > >
> > >
> > > On 26.05.20 15:17, Greg KH wrote:
> > > >
> > > > On Tue, May 26, 2020 at 02:44:18PM +0200, Alexander Graf wrote:
> > > > >
> > > > >
> > > > > On 26.05.20 14:33, Greg KH wrote:
> > > > > >
> > > > > > On Tue, May 26, 2020 at 01:42:41PM +0200, Alexander Graf wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 26.05.20 08:51, Greg KH wrote:
> > > > > > > >
> > > > > > > > On Tue, May 26, 2020 at 01:13:23AM +0300, Andra Paraschiv wrote:
> > > > > > > > > +#define NE "nitro_enclaves: "
> > > > > > > >
> > > > > > > > Again, no need for this.
> > > > > > > >
> > > > > > > > > +#define NE_DEV_NAME "nitro_enclaves"
> > > > > > > >
> > > > > > > > KBUILD_MODNAME?
> > > > > > > >
> > > > > > > > > +#define NE_IMAGE_LOAD_OFFSET (8 * 1024UL * 1024UL)
> > > > > > > > > +
> > > > > > > > > +static char *ne_cpus;
> > > > > > > > > +module_param(ne_cpus, charp, 0644);
> > > > > > > > > +MODULE_PARM_DESC(ne_cpus, "<cpu-list> - CPU pool used for Nitro Enclaves");
> > > > > > > >
> > > > > > > > Again, please do not do this.
> > > > > > >
> > > > > > > I actually asked her to put this one in specifically.
> > > > > > >
> > > > > > > The concept of this parameter is very similar to isolcpus= and maxcpus= in
> > > > > > > that it takes CPUs away from Linux and instead donates them to the
> > > > > > > underlying hypervisor, so that it can spawn enclaves using them.
> > > > > > >
> > > > > > > From an admin's point of view, this is a setting I would like to keep
> > > > > > > persisted across reboots. How would this work with sysfs?
> > > > > >
> > > > > > How about just as the "initial" ioctl command to set things up? Don't
> > > > > > grab any cpu pools until asked to. Otherwise, what happens when you
> > > > > > load this module on a system that can't support it?
> > > > >
> > > > > That would give any user with access to the enclave device the ability to
> > > > > remove CPUs from the system. That's clearly a CAP_ADMIN task in my book.
> > > >
> > > > Ok, what's wrong with that?
> > >
> > > Would you want random users to get the ability to hot unplug CPUs from your
> > > system? At unlimited quantity? I don't :).
> >
> > A random user, no, but one with admin rights, why not? They can do that
> > already today on your system, this isn't new.
> >
> > > > > Hence this whole split: The admin defines the CPU Pool, users can safely
> > > > > consume this pool to spawn enclaves from it.
> > > >
> > > > But having the admin define that at module load / boot time, is a major
> > > > pain. What tools do they have that allow them to do that easily?
> > >
> > > The normal toolbox: editing /etc/default/grub, adding an /etc/modprobe.d/
> > > file.
> >
> > Editing grub files is horrid, come on...
>
> It's not editing grub files, it's editing template config files that then
> are used as input for grub config file generation :).
>
> > > When but at module load / boot time would you define it? I really don't want
> > > to have a device node that in theory "the world" can use which then allows
> > > any user on the system to hot unplug every CPU but 0 from my system.
> >
> > But you have that already when the PCI device is found, right? What is
> > the initial interface to the driver? What's wrong with using that?
> >
> > Or am I really missing something as to how this all fits together with
> > the different pieces? Seeing the patches as-is doesn't really provide a
> > good overview, sorry.
>
> Ok, let me walk you through the core donation process.
>
> Imagine you have a parent VM with 8 cores. Every one of those virtual cores
> is 1:1 mapped to a physical core.
>
> You enumerate the PCI device, you start working with it. None of that
> changes your topology.
>
> You now create an enclave spanning 2 cores. The hypervisor will remove the
> 1:1 map for those 2 cores and instead mark them as "free floating" on the
> remaining 6 cores. It then uses the 2 freed up cores and creates a 1:1 map
> for the enclave's 2 cores
>
> To ensure that we still see a realistic mapping of core topology, we need to
> remove those 2 cores from the parent VM's scope of execution. The way this
> is done today is by going through CPU offlining.
>
> The first and obvious option would be to offline all respective CPUs when an
> enclave gets created. But unprivileged users should be able to spawn
> enclaves. So how do I ensure that my unprivileged user doesn't just offline
> all of my parent VM's CPUs?
>
> The option implemented here is that we fold this into a two-stage approach.
> The admin reserves a "pool" of cores for enclaves to use. Unprivileged users
> can then consume cores from that pool, but not more than those.
>
> That way, unprivileged users have no influence over whether a core is
> enabled or not. They can only consume the pool of cores that was dedicated
> for enclave use.
>
> It also has the big advantage that you don't have dynamically changing CPU
> topology in your system. Long living processes that adjust their environment
> to the topology can still do so, without cores getting pulled out under
> their feet.

Ok, that makes more sense, but:

> > > > > So I really don't think an ioctl would be a great user experience. Same for
> > > > > a sysfs file - although that's probably slightly better than the ioctl.
> > > >
> > > > You already are using ioctls to control this thing, right? What's wrong
> > > > with "one more"? :)
> > >
> > > So what we *could* do is add an ioctl to set the pool size which then does a
> > > CAP_ADMIN check. That however means you now are in priority hell:
> > >
> > > A user that wants to spawn an enclave as part of an nginx service would need
> > > to create another service to set the pool size and indicate the dependency
> > > in systemd control files.
> > >
> > > Is that really better than a module parameter?
> >
> > module parameters are hard to change, and manage control over who really
> > is changing them.
>
> What is hard about
>
> $ echo 1-5 > /sys/module/nitro_enclaves/parameters/ne_cpus

So at runtime, after all is booted and up and going, you just ripped
cores out from under someone's feet? :)

And the code really handles writing to that value while the module is
already loaded and up and running? At a quick glance, it didn't seem
like it would handle that very well as it only is checked at ne_init()
time.

Or am I missing something?

Anyway, yes, if you can dynamically do this at runtime, that's great,
but it feels ackward to me to rely on one configuration thing as a
module parameter, and everything else through the ioctl interface.
Unification would seem to be a good thing, right?

thanks,

greg k-h