Re: [PATCH v3 07/18] nitro_enclaves: Init misc device providing the ioctl interface

From: Paraschiv, Andra-Irina
Date: Thu May 28 2020 - 13:09:51 EST

Next message: Daniel Jordan: "Re: [PATCH -V3] swap: Reduce lock contention on swap cache from swap slots allocation"
Previous message: Ed Maste: "[RESEND PATCH v2] perf tools: correct license on jsmn json parser"
In reply to: Greg KH: "Re: [PATCH v3 07/18] nitro_enclaves: Init misc device providing the ioctl interface"
Next in thread: Benjamin Herrenschmidt: "Re: [PATCH v3 07/18] nitro_enclaves: Init misc device providing the ioctl interface"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 28/05/2020 16:12, Greg KH wrote:

On Thu, May 28, 2020 at 03:01:36PM +0200, Alexander Graf wrote:

On 27.05.20 00:24, Greg KH wrote:

On Tue, May 26, 2020 at 03:44:30PM +0200, Alexander Graf wrote:

On 26.05.20 15:17, Greg KH wrote:

On Tue, May 26, 2020 at 02:44:18PM +0200, Alexander Graf wrote:

On 26.05.20 14:33, Greg KH wrote:

On Tue, May 26, 2020 at 01:42:41PM +0200, Alexander Graf wrote:

On 26.05.20 08:51, Greg KH wrote:

On Tue, May 26, 2020 at 01:13:23AM +0300, Andra Paraschiv wrote:

+#define NE "nitro_enclaves: "

Again, no need for this.

+#define NE_DEV_NAME "nitro_enclaves"

KBUILD_MODNAME?

+#define NE_IMAGE_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+static char *ne_cpus;
+module_param(ne_cpus, charp, 0644);
+MODULE_PARM_DESC(ne_cpus, "<cpu-list> - CPU pool used for Nitro Enclaves");

Again, please do not do this.

I actually asked her to put this one in specifically.

The concept of this parameter is very similar to isolcpus= and maxcpus= in
that it takes CPUs away from Linux and instead donates them to the
underlying hypervisor, so that it can spawn enclaves using them.

From an admin's point of view, this is a setting I would like to keep
persisted across reboots. How would this work with sysfs?

How about just as the "initial" ioctl command to set things up? Don't
grab any cpu pools until asked to. Otherwise, what happens when you
load this module on a system that can't support it?

That would give any user with access to the enclave device the ability to
remove CPUs from the system. That's clearly a CAP_ADMIN task in my book.

Ok, what's wrong with that?

Would you want random users to get the ability to hot unplug CPUs from your
system? At unlimited quantity? I don't :).

A random user, no, but one with admin rights, why not? They can do that
already today on your system, this isn't new.

Hence this whole split: The admin defines the CPU Pool, users can safely
consume this pool to spawn enclaves from it.

But having the admin define that at module load / boot time, is a major
pain. What tools do they have that allow them to do that easily?

The normal toolbox: editing /etc/default/grub, adding an /etc/modprobe.d/
file.

Editing grub files is horrid, come on...

It's not editing grub files, it's editing template config files that then
are used as input for grub config file generation :).

When but at module load / boot time would you define it? I really don't want
to have a device node that in theory "the world" can use which then allows
any user on the system to hot unplug every CPU but 0 from my system.

But you have that already when the PCI device is found, right? What is
the initial interface to the driver? What's wrong with using that?

Or am I really missing something as to how this all fits together with
the different pieces? Seeing the patches as-is doesn't really provide a
good overview, sorry.

Ok, let me walk you through the core donation process.

Imagine you have a parent VM with 8 cores. Every one of those virtual cores
is 1:1 mapped to a physical core.

You enumerate the PCI device, you start working with it. None of that
changes your topology.

You now create an enclave spanning 2 cores. The hypervisor will remove the
1:1 map for those 2 cores and instead mark them as "free floating" on the
remaining 6 cores. It then uses the 2 freed up cores and creates a 1:1 map
for the enclave's 2 cores

To ensure that we still see a realistic mapping of core topology, we need to
remove those 2 cores from the parent VM's scope of execution. The way this
is done today is by going through CPU offlining.

The first and obvious option would be to offline all respective CPUs when an
enclave gets created. But unprivileged users should be able to spawn
enclaves. So how do I ensure that my unprivileged user doesn't just offline
all of my parent VM's CPUs?

The option implemented here is that we fold this into a two-stage approach.
The admin reserves a "pool" of cores for enclaves to use. Unprivileged users
can then consume cores from that pool, but not more than those.

That way, unprivileged users have no influence over whether a core is
enabled or not. They can only consume the pool of cores that was dedicated
for enclave use.

It also has the big advantage that you don't have dynamically changing CPU
topology in your system. Long living processes that adjust their environment
to the topology can still do so, without cores getting pulled out under
their feet.

Ok, that makes more sense, but:

So I really don't think an ioctl would be a great user experience. Same for
a sysfs file - although that's probably slightly better than the ioctl.

You already are using ioctls to control this thing, right? What's wrong
with "one more"? :)

So what we *could* do is add an ioctl to set the pool size which then does a
CAP_ADMIN check. That however means you now are in priority hell:

A user that wants to spawn an enclave as part of an nginx service would need
to create another service to set the pool size and indicate the dependency
in systemd control files.

Is that really better than a module parameter?

module parameters are hard to change, and manage control over who really
is changing them.

What is hard about

$ echo 1-5 > /sys/module/nitro_enclaves/parameters/ne_cpus

So at runtime, after all is booted and up and going, you just ripped
cores out from under someone's feet? :)

And the code really handles writing to that value while the module is
already loaded and up and running? At a quick glance, it didn't seem
like it would handle that very well as it only is checked at ne_init()
time.

Or am I missing something?

It's checked for now at module init, true.

I started with init and it remained as a TODO on my side to adapt the logic to be able to handle the setup via the sysfs file for the module.

Thanks,
Andra

Anyway, yes, if you can dynamically do this at runtime, that's great,
but it feels ackward to me to rely on one configuration thing as a
module parameter, and everything else through the ioctl interface.
Unification would seem to be a good thing, right?

thanks,

greg k-h

Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

Next message: Daniel Jordan: "Re: [PATCH -V3] swap: Reduce lock contention on swap cache from swap slots allocation"
Previous message: Ed Maste: "[RESEND PATCH v2] perf tools: correct license on jsmn json parser"
In reply to: Greg KH: "Re: [PATCH v3 07/18] nitro_enclaves: Init misc device providing the ioctl interface"
Next in thread: Benjamin Herrenschmidt: "Re: [PATCH v3 07/18] nitro_enclaves: Init misc device providing the ioctl interface"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]