Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system callfiltering

From: Avi Kivity
Date: Thu May 26 2011 - 06:57:36 EST


On 05/26/2011 12:30 PM, Ingo Molnar wrote:
* Avi Kivity<avi@xxxxxxxxxx> wrote:

> > Note that tools/kvm/ would probably like to implement its own
> > object manager model as well in addition to access method
> > restrictions: by being virtual hardware it deals with many
> > resources and object hierarchies that are simply not known to the
> > host OS's LSM.
> >
> > Unlike Qemu tools/kvm/ has a design that is very fit for MAC
> > concepts: it uses separate helper threads for separate resources
> > (this could in many cases even be changed to be separate
> > processes which only share access to the guest RAM image) - while
> > Qemu is in most parts a state machine, so in tools/kvm/ we can
> > realistically have a good object manager and keep an exploit in a
> > networking interface driver from being able to access disk driver
> > state.
>
> You mean each thread will have a different security context? I
> don't see the point. All threads share all of memory so it would
> be trivial for one thread to exploit another and gain all of its
> privileges.

You are missing the geniality of the tools/kvm/ thread pool! :-)

I'm sure the thread pool is very general, but the hardware we're modelling is not.

It could be switched to a worker *process* model rather easily. Guest
RAM and (a limited amount of) global resources would be shared via
mmap(SHARED), but otherwise each worker process would have its own
stack, its own subsystem-specific state, etc.

Suppose a guest reconfigures a device's MSI page, and suppose that's handled by the device's process. Now it's not sufficient to update some global state, you have to go and tell the host kernel about it. With good privilege separation the device process would not be permitted to do that; now it has to pass a message to a process that is.

Same thing applies for BARs, reset signals, live migration, etc.

Exploiting other device domains via the shared guest RAM image is not
possible, we treat guest RAM as untrusted data already.

Right.

Devices, like real hardware devices, are functionally pretty
independent from each other, so this security model is rather natural
and makes a lot of sense.

When just pushing packets, you are right. However setup/configuration is hardly clean.

Consider a CD-ROM eject, for example. Now it can't be done by a simple callback.

> A multi process model works better but it has significant memory
> and performance overhead.

Not in Linux :-) We context-switch between processes almost as
quickly as we do between threads. With modern tagged TLB hardware
it's even faster.

Once we get PCID in, yes. There's still the message passing overhead, and unnecessary context switches. In a threaded model you can choose whether to switch threads or not, in a process model you cannot.

> (well the memory overhead is much smaller when using transparent
> huge pages, but these only work for anonymous memory).

The biggest amount of RAM is the guest RAM image - but if that is
mmap(SHARED) and mapped using hugepages then the pte overhead from a
process model is largely mitigated.

That doesn't work with memory hotplug.

Once we have a process model then isolation and MAC between devices
becomes a very real possibility: exploit via one network interface
cannot break into a disk interface.

Yes, certainly.

Maybe even the isolation and per device access control of
*same-class* devices from each other is possible: with careful
implementation of the subsystem shared data structures. (which isnt
much really)

Right, hardly at all in fact. The problem comes from the side-band issues like reset, interrupts, hotplug, and whatnot.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/